views:

196

answers:

4

I'm making an open source C# library for other developers to use. My key concern is ease of use. This means using intuitive names, intuitive method usage and such.

This is the first time I've done something with other people in mind, so I'm really concerned about the quality of the architecture. Plus, I wouldn't mind learning a thing or two. :)

I have three classes: Downloader, Parser and Movie

I was thinking that it would be best to only expose the Movie class of my library and have Downloader and Parser remain hidden from invocation.

Ultimately, I see my library being used like this.

using FreeIMDB;

public void Test()
{
    var MyMovie = Movie.FindMovie("The Matrix");
    //Now MyMovie would have all it's fields set and ready for the big show.
}

Can you review how I'm planning this, and point out any wrong judgement calls I've made and where I could improve.

Remember, my main concern is ease of use.

Movie.cs

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Drawing;


namespace FreeIMDB
{
    public class Movie
    {
        public Image Poster { get; set; }
        public string Title { get; set; }
        public DateTime ReleaseDate { get; set; }
        public string Rating { get; set; }
        public string Director { get; set; }
        public List<string> Writers { get; set; }
        public List<string> Genres { get; set; }
        public string Tagline { get; set; }
        public string Plot { get; set; }
        public List<string> Cast { get; set; }
        public string Runtime { get; set; }
        public string Country { get; set; }
        public string Language { get; set; }

        public Movie FindMovie(string Title)
        {
            Movie film = new Movie();
            Parser parser = Parser.FromMovieTitle(Title);

            film.Poster = parser.Poster();
            film.Title = parser.Title();
            film.ReleaseDate = parser.ReleaseDate();
            //And so an so forth.
        }

        public Movie FindKnownMovie(string ID)
        {
            Movie film = new Movie();
            Parser parser = Parser.FromMovieID(ID);

            film.Poster = parser.Poster();
            film.Title = parser.Title();
            film.ReleaseDate = parser.ReleaseDate();
            //And so an so forth.
        }
    }
}

Parser.cs

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;

namespace FreeIMDB
{
    /// <summary>
    /// Provides a simple, and intuitive way for searching for movies and actors on IMDB.
    /// </summary>
    class Parser
    {
        private Downloader downloader = new Downloader();                
        private HtmlDocument Page;

        #region "Page Loader Events"
        private Parser()
        {

        }

        public static Parser FromMovieTitle(string MovieTitle)
        {
            var newParser = new Parser();
            newParser.Page = newParser.downloader.FindMovie(MovieTitle);
            return newParser;
        }

        public static Parser FromActorName(string ActorName)
        {
            var newParser = new Parser();
            newParser.Page = newParser.downloader.FindActor(ActorName);
            return newParser;
        }

        public static Parser FromMovieID(string MovieID)
        {
            var newParser = new Parser();
            newParser.Page = newParser.downloader.FindKnownMovie(MovieID);
            return newParser;
        }

        public static Parser FromActorID(string ActorID)
        {
            var newParser = new Parser();
            newParser.Page = newParser.downloader.FindKnownActor(ActorID);
            return newParser;
        }
        #endregion

        #region "Page Parsing Methods"
        public string Poster()
        {
            //Logic to scrape the Poster URL from the Page element of this.
            return null;
        }

        public string Title()
        {
            return null;
        }

        public DateTime ReleaseDate()
        {
            return null;
        }        
        #endregion        
    }
}

-----------------------------------------------

Do you guys think I'm heading towards a good path, or am I setting myself up for a world of hurt later on?

My original thought was to separate the downloading, the parsing and the actual populating to easily have an extensible library. Imagine if one day the website changed its HTML, I would then only have to modifiy the parsing class without touching the Downloader.cs or Movie.cs class.

Thanks for reading and for helping!

Any other ideas?

A: 

Hi,

I would expose only items that make sense to expose. For you code the end result is movie information. The downloader and parser are useless unless used to get the movie information so there is no reason to expose them.

Also in your Movie class I would only make the information Getable, not Setable too. there is no, "save" functionality to the class so there is no reason to edit the information once you get it.

Other than that if this is for other people, I would comment what each class, and member and each public/private class variable are for. For the Movie class I would probably include an example of how to use it in the class comment.

The last thing, if there is an error in the two private classes, the user of the Movie class needs to be informed somehow. Possibly a public bool variable called success?

On a personal preference note, for Your Movie class, I would have your two functions be constructors so that I could just build the class as follows.

Movie myMovie = new Movie("Name"); or Movie myMovie = new Movie(1245);

Justin808
+1  A: 

Here are a couple of suggestions, nothing major, just some things to consider.

  1. I understand you wanting to keep the API minimal, thus making the Parser and Downloader private/internal, but you may want to consider making them public anyway. The biggest reason is that since this is going to be an open source project you are most likely going to get folks who are, well, hackers. If by chance they want to do something that isn't directly supported by the API you provide, they will appreciate you making the bits available to them to do it themselves. Make the "standard" use-cases as simple as possible, but also make it easy for folks to do whatever they want with it.

  2. It looks like there is some data duplication between your Movie class, and your Parser. Specifically the parser is getting the fields that are defined by your Movie. It seems like it would make more sense to make Movie a data object (just the properties), and have the Parser class operate on it directly. So your parser FromMovieTitle could return a Movie instead of a Parser. Now that brings up the question of what to do with the methods on the Movie class FindMovie and FindKnownMovie. I would say you could create a MovieFinder class which had those methods in it, and they would utilize the Parser to return a Movie.

  3. It looks like the parsing tasks could get rather complex since you are going to be scraping HTML (at least based on the comments). You may want to consider utilizing a Chain or Responsibility pattern (or something similar) in the parser with a simple interface that would allow you to create a new implementation for the various data elements your wanting to extract. This would keep the Parser class fairly simple, and also allow other folks to more easily extend the code to extract data elements that you may not support directly (again, since this is Open Source people tend to like easy extensibility).

Generally speaking if you keep the Single Responsibility Principle and Open/Closed Principle in mind along with your goal of keeping the standard usage easy, you should end up with something that people will find easy to use for the things you've thought of supporting, and easy to extend for the things you haven't.

ckramer
+4  A: 

Your API is mostly static, meaning you are setting yourself up for maintainability issues in the future. This is because the static methods are actually singletons, which have some significant drawbacks.

I suggest striving for a more instance-based, decoupled approach. This will naturally separate the definition of each operation from its implementation, leaving room for extensibility and configuration. An API's ease-of-use is measured not only by its public surface, but also by its adaptability.

Here is how I would go about designing this system. First, define something which is responsible for fetching movies:

public interface IMovieRepository
{
    Movie FindMovieById(string id);

    Movie FindMovieByTitle(string title);
}

Next, define something which is responsible for downloading HTML documents:

public interface IHtmlDownloader
{
    HtmlDocument DownloadHtml(Uri uri);
}

Then, define a repository implementation which uses a downloader:

public class MovieRepository : IMovieRepository
{
    private readonly IHtmlDownloader _downloader;

    public MovieRepository(IHtmlDownloader downloader)
    {
        _downloader = downloader;
    }

    public Movie FindMovieById(string id)
    {
        var idUri = ...build URI...;

        var html = _downloader.DownloadHtml(idUri);

        return ...parse ID HTML...;
    }

    public Movie FindMovieByTitle(string title)
    {
        var titleUri = ...build URI...;

        var html = _downloader.DownloadHtml(titleUri);

        return ...parse title HTML...;
    }
}

Now, anywhere you need to download movies, you can depend solely on IMovieRepository without being directly coupled to all the implementation details beneath it:

public class NeedsMovies
{
    private readonly IMovieRepository _movies;

    public NeedsMovies(IMovieRepository movies)
    {
        _movies = movies;
    }

    public void DoStuffWithMovie(string title)
    {
        var movie = _movies.FindMovieByTitle(title);

        ...
    }
}

In addition, you can now easily test the parsing logic without having to make web calls. Simply save the HTML and create a downloader which gives it to a repository:

public class TitleHtmlDownloader : IHtmlDownloader
{
    public HtmlDocument DownloadHtml(Uri uri)
    {
        return ...create document from saved HTML...
    }
}

[Test]
public void ParseTitle()
{
    var movies = new MovieRepository(new TitleHtmlDownloader());

    var movie = movies.GetByTitle("The Matrix");

    Assert.AreEqual("The Matrix", movie.Title);

    ...assert other values from the HTML...
}
Bryan Watts
TBH that looks like you're overthinking it...
RCIX
@RCIX: My goal isn't just to move the bits around properly; it is also to minimize the impact of change. Systems written in this style (Dependency Inversion/IoC) are resilient and evolve gracefully. Static/tightly-coupled APIs generally do not, especially when dependent upon a third-party service. A library for other developers warrants a sustainable design.
Bryan Watts
You also might want to take it a step further and create an `IMovieDownloader` and `IMovieParser` so that you can take advantage of the ability to parse other types of documents, say XML or a web service.
jasonh
A: 

Well, first off, I think your primary concern is misguided. In my experience, designing an architecture for "ease of use", while pretty to look at with all their encapsulated functionality, tend to be highly interdependent and rigid. As an application built on such a principal grows, you will run into severe problems with dependencies (classes end up becoming directly dependent on more and more, and indirectly dependent upon, ultimately, everything in your system.) This leads to true maintenance nightmares that dwarf the "ease of use" benefits that you might be gaining.

Two of the most important rules of architecture are Separation of Concerns, and Single Responsibility. These two rules dictate things like keeping infrastructural concerns (data access, parsing) separated from business concerns (finding movies), and making sure each class you write is only responsible for one thing (representing movie information, searching for individual movies.)

Your architecture, while currently small, has violated both Single Responsibility already. Your Movie class, while it is elegant, cohesive, and easy to use, is blending two responsibilities: representing movie information, and servicing movie searches. Those two responsibilities should be in separate classes:

// Data Contract (or Data Transfer Object)
public class Movie
{
        public Image Poster { get; set; }
        public string Title { get; set; }
        public DateTime ReleaseDate { get; set; }
        public string Rating { get; set; }
        public string Director { get; set; }
        public List<string> Writers { get; set; }
        public List<string> Genres { get; set; }
        public string Tagline { get; set; }
        public string Plot { get; set; }
        public List<string> Cast { get; set; }
        public string Runtime { get; set; }
        public string Country { get; set; }
        public string Language { get; set; }
}

// Movie database searching service contract
public interface IMovieSearchService    
{
        Movie FindMovie(string Title);
        Movie FindKnownMovie(string ID);
}

// Movie database searching service
public partial class MovieSearchService: IMovieSearchService
{
        public Movie FindMovie(string Title)
        {
            Movie film = new Movie();
            Parser parser = Parser.FromMovieTitle(Title);

            film.Poster = parser.Poster();
            film.Title = parser.Title();
            film.ReleaseDate = parser.ReleaseDate();
            //And so an so forth.
        }

        public Movie FindKnownMovie(string ID)
        {
            Movie film = new Movie();
            Parser parser = Parser.FromMovieID(ID);

            film.Poster = parser.Poster();
            film.Title = parser.Title();
            film.ReleaseDate = parser.ReleaseDate();
            //And so an so forth.
        }
}

This may seem trivial, however separating the behavior from your data can become critical as a system grows. By creating an interface for your movie search service, you provide decoupling and flexibility. If you, for whatever reason, need to add another type of movie search service that provides the same functionality, you can do so without breaking your consumers. The Movie data type can be reused, your clients bind to the IMovieSearchService interface rather than a concrete class, allowing the implementations to be interchanged (or multiple implementations used simultaneously.) It is best to put the IMovieSearchService interface and Movie data type in a separate project than the MovieSearchService class.

You made a good move by writing the parser class, and keeping parsing separate from the movie search functionality. That meets the rule of Separation of Concerns. However, your approach is going to lead to difficulty. For one, it is based on static methods, which are very inflexible. Every time you need to add a new type of parser, you have to add a new static method, and update any of the code that needs to use that particular parsing type. A better approach is to utilized the power of polymorphism, and ditch static:

public abstract class Parser
{
    public abstract IEnumerable<Movie> Parse(string criteria);
}

public class ByTitleParser: Parser
{
    public override IEnumerable<Movie> Parse(string title)
    {
        // TODO: Logic to parse movie information by title
        // Likely to return one movie most of the time, but some movies from different eras may have the same title
    }
}

public class ByActorParser: Parser
{
    public override IEnumerable<Movie> Parse(string actor)
    {
        // TODO: Logic to parse movie information by actor
        // This one can return more than one movie, as an actor may act in more than one movie
    }
}

public class ByIdParser: Parser
{
    public override IEnumerable<Movie> Parse(string id)
    {
        // TODO: Logic to parse movie information by id
        // This one should only ever return a set of one movie, since it is by a unique key
    }
}

Finally, another useful principal is Dependency Injection. Rather than directly creating new instances of your dependencies, abstract their creation via something like a factory, and inject your dependencies and factories into the services that need them:

public class ParserFactory
{
    public virtual Parser GetParser(string criteriaType)
    {
        if (criteriaType == "bytitle") return new ByTitleParser();
        else if (criteriaType == "byid") return new ByIdParser();
        else throw new ArgumentException("Unknown criteria type.", "criteriaType");
    }
}

// Improved movie database search service
public class MovieSearchService: IMovieSearchService
{
        public MovieSearchService(ParserFactory parserFactory)
        {
            m_parserFactory = parserFactory;
        }

        private readonly ParserFactory m_parserFactory;

        public Movie FindMovie(string Title)
        {
            var parser = m_parserFactory.GetParser("bytitle");
            var movies = parser.Parse(Title); // Parse method creates an enumerable set of Movies that matched "Title"

            var firstMatchingMovie = movies.FirstOrDefault();

            return firstMatchingMovie;
        }

        public Movie FindKnownMovie(string ID)
        {
            var parser = m_parserFactory.GetParser("byid");
            var movies = parser.Parse(Title); // Parse method creates an enumerable set of Movies that matched "ID"

            var firstMatchingMovie = movies.FirstOrDefault();

            return firstMatchingMovie;
        }
}

This improved version has several benefits. For one, it is not responsible for creating instances of the ParserFactory. That allows multiple implementations of the ParserFactory to be used. Early on, you may only search IMDB. In the future, you may wish to search other sites, and alternative parsers for alternative implementations of the IMovieSearchService interface can be provided.

jrista