Split the lists into individual items and work on that level.
Some tables:
lists
- ID (PK)
- sequence (the "A-B-C-D" entries above)
- [whatever else]
items
- ID (PK)
- name (value, word, whatever makes sense)
- [whatever else]
list_items
- list_ID
- item_ID
- [an ordinal int, if "G-H-B-A" and "A-B-G-H" are considered different sequences]
(composite PK list_ID, item_ID [, ordinal] on that one, basic many:many relation)
Some data, so it's more clear what the tables represent:
INSERT INTO items (ID, name) VALUES (1, 'A'), (2, 'B'), (3, 'G'), (4, 'H');
INSERT INTO lists (ID, sequence) VALUES (1, 'A-B-G-H');
INSERT INTO list_items (list_ID, item_ID) VALUES (1, 1), (1, 2), (1, 3), (1, 4);
INSERT INTO lists (ID, sequence) VALUES (2, 'B-A-G');
INSERT INTO list_items (list_ID, item_ID) VALUES (2, 2), (2, 1), (2, 3);
And finally, to find lists that contain all items (A, B, G, H):
SELECT lists.sequence FROM lists
JOIN list_items ON lists.ID = list_items.list_ID
JOIN items AS i1 ON list_items.item_ID = i1.ID HAVING i1.name = 'A'
JOIN items AS i2 ON list_items.item_ID = i2.ID HAVING i2.name = 'B'
JOIN items AS i3 ON list_items.item_ID = i3.ID HAVING i3.name = 'G'
JOIN items AS i4 ON list_items.item_ID = i4.ID HAVING i4.name = 'H'
That should return any lists like "A-B-G-H", "G-H-A-B", "H-A-T-B-A-G", etc, but not "B-U-G-H-U-T" (no A) or "B-A-T-H" (no G) - all conditions have to be satisfied. Doing an "any" search might be a little more involved (writing this in my head over lunch, but RIGHT JOIN
alone would probably result in all kinds of duplicates & slowness).
It won't map any genomes or redefine human language, but should be okay for a decent-sized data set. Either way, I'd avoid storing each list as a varchar and doing "WHERE sequence LIKE '%A%' AND sequence LIKE '%B%'
" stuff unless you absolutely can't handle the extra work to add new data.