views:

242

answers:

1

i am doing a database design using EAV. I am facing an issue when i try to model an entity with attribute having multiple values?

For example

Entity

id         | name           | description
--         | ----           | ------------ 
1          | configuration1 | configuration1

Attribute

id         | entityId    | name  | type
--         | --------    | ----  | ----
1          | 1           | att1  | string
2          | 1           | att2  | int
3          | 1           | att3  | List<String>  (How will i model this?)

Value

id        | attributeId    | value
--        | -----------    | -----
1         | 1              | a    
2         | 2              | 1
3         | 3              | b
4         | 3              | c    
5         | 3              | d

Is this the correct way to handle list of values?

Please provide any helpful link to model this?

Two more questions

1) Is type as List correct? I want to be sure that when one attribute has multiple values i will give the type as List

2) How the database design will change when attribute corresponds to an Object? For example User has Address..How will i handle composite parameters?

It would be great if you can provide me a rough tabular representation or diagram

Thanks

Shekhar

+2  A: 

Yes! it is quite possible to have multi-value attributes attributes in EAV.
In fact it is easier than with the traditional relational model where one would need to either create an extra column, or store the multiple values with a delimited format of sort; both such approach suffer added complexity when querying the database for a given value of the underlying field (attribute).

The simplest way to have multiple values is simply to have an extra value record! (as shown in the question)
In addition, the EAV store structure can be changed to explicitly accommodate multi-values, with:

  • an extra boolean-like field in the attribute table to indicate whether the field can or cannot be multi-value. (BTW, similar properties of the attribute can be codified as well, for example whether the attribute is required or not etc
  • The value table can get an extra column to indicate the sequence number of the value (set to 0 or 1 for all non multivalue attributes, otherwise, an incremented integer).

As said, these changes to the physical schema of the EAV store are not necessary, but they can be used to ensure that the data conforms to the (logical) schema as well as to maybe display the several values of a multivalue attributes in a particular order etc.

Edit: (details on implementing multi-value and/or composite ("object-like") attributes)
If you are absolutely positive that the multiple ["sub"-]values which constitute an attribute (or similarly the multiple parts which constitute an "object type" attribute), are completely atomic, i.e. will never be searched or displayed (or ...) individually, you can store such an attribute' value "sets", as a single record in the value table, by encoding the multiple values into a single string; For this purpose, JSON or XML-at-large comes to mind and seems particularly interesting for very expandable/generic, but any other format which you can parse in and out in a reliable fashion would also work (say delimited format).

A more "natural" way (EAV-wise) to store such "attribute value parts", is by storing them individually (in multiple records of the value table, possibly with a sequence field as hinted earlier). This approach allows handling the "sub parts", in some contexts, as if they were attributes.

In both cases, you need to alter the attribute table to add the necessary properties and types codes, to describe such multi-part attributes. Similarly to the approach of storing the data (in the value table), you can either make the attribute record such that all the info for a given [multi-part] attribute is stored in a single attribute record, or, [and this is typically easier and more flexible] you can create one attribute per part, plus one attribute to "tie them together" (for example with a property which contains a string delimted with each of the attribute ID values of the subparts.

For example:
A composite attribute for metallic pipe items, could be the diameter, made of two parts: a numeric value and a unit code (Millimeter, vs. Inches).
With the first approach:
- there would be one record in the attribute table, with a type indicating that this is a multi-value, and an extension property to contain the [ordered] list of types of the individual sub parts.
- there would be a single record in the value table, containing a coded value such as as say "0.75|Inch" (or <diam>0.75</diam><unit>Inch</unit>).
With the second approach:
- there would be 3 records in the attribute table: a record or type numeric and named say "diamvalue", a record of type string, named "unit" and a record of type composite name "Diameter"; this last record would somehow have a reference to the ID of the two other attributes (a simple comma delimited string comes to mind) - there would be two records in the value table, one each of the diamvalue and the unit attributes (such records would have an additional field, called say "parent" containing the AttributeID of the "Diameter" attribute. Optionally there could be also an value record for the "Diameter" attribute [I personally find this redundant with the "parent" property.

As indicated earlier, the main advantage of the second solution is that [when appropriate] one can query the catalog for a particular set of items based on the value of an attribute part, for example searching for all pipes which have a metric unit. Such queries are resolved at the level of SQL, whereby with the first approach, SQL would have to scan all attribute values for the attribute "Diameter" and parse the value to search for the unit code.

A picture's worth a thousand words ;-)
This diagram shows a possible layout with sample data for the "second approach".

Entity 
    id   | name           | description
    --   | ----           | ------------ 
    1    | configuration1 | configuration1

Attribute 
    id   | name      | type     | Required | Repeats | SubAttribIdList
    --   | ----      | ----     | -------- | ------- | ---------------
    1    | att1      | string   | N        | N       | null   (only applicable to composite types)
    2    | att2      | int      | Y        | N       | null
    3    | att3      | string   | Y        | Y       | null
    4    | DiamValue | numeric  | Y        | N       | null
    5    | Unit      | string   | Y        | N       | null
    6    | Diameter  | composite| N        | N       | 4,5

Value
    id  | entityId| attributeId  | ParentAttribId |SeqNr | value     
    --  | --------| -----------  | -------------- |----- | -----
    1   | 1       | 1            | null           | 1    | a    
    2   | 1       | 2            | null           | 1    | 1
    3   | 1       | 3            | null           | 1    | b  (this value and next show show a repeating attribute)
    4   | 1       | 3            | null           | 2    | c    
    5   | 1       | 3            | null           | 3    | d
    6   | 1       | 4            | 6              | 1    | 0.75   (this value and next one shows a composite attribute
    7   | 1       | 5            | 6              | 1    | Inches

A few notes:
- The SeqNr for the values ids 6 and 7 is 1, for both. Their order is implicit to the SubAttribIdList. If Attribute id 6 had been made a multi-value ("Repeats") attribute, an entity could have additional couplets of two values, sequenced, in pair, 2, 3 etc.
- The sequence Number for non repeatable Attributes is set to 1, systematically, could as well be NULL this this doesn't apply.
- The "Required" property of attribute doesn't figure in the multi-value or composite question; I just added it as it is commonly used to help the application (or the entity access layer) enforce various integrity rules.
- A few of the design choices in this layout imply a maximum of 1 level of inclusion for composite attributes (a composite cannot be a included in a composite), as well as prevent a composite to include a multi-value attribute. These limitations can be avoided with the proper structure (and with a bit of added complexity in the access layer), but the simpler schema is typically acceptable (attributes that would require such fancy structure are often a indicative of a flaw in the logical schema).

mjv
Thanks for your answer..I have edited the question..can you please provide the answer
Shekhar
@Shekhar: I just took the time to "draw" a visual example of the way multi-value and/or composite attributes can be stored. In so doing, I noted a mistake in _your_ proposed schema: the EntityID field belongs to the value table, not the attribute table; that's fairly self evident (otherwise you'd have almost as many attribute records as values etc.), 't was probably a error you introduced in transcribing the schema onto this painful space-formatted layout.
mjv