views:

458

answers:

5

I'm making a large number of seqlogos programmatically. They are hundreds of columns wide and so running a seqlogo normally creates letters that are too thin to see. I've noticed that I only care about a few of these columns (not necessarily consecutive columns) ... most are noise but some are highly conserved.

I use something like this snippet:

wide_seqs = cell2mat(arrayfun(@randseq, repmat(200, [500 1]), 'uniformoutput', false));
wide_seqs(:, [17,30, 55,70,130]) = repmat(['ATCGG'], [500 1])

conserve_cell = seqlogo(wide_seqs, 'displaylogo', false);
high_bit_cols = any(conserve_cell{2}>1.0,1);
[~, handle] = seqlogo(wide_seqs(:,high_bit_cols ));

Although when I do this I lose the information about which columns the data came from.

Normally I would just change the x-axis of the seqlogo. However, seqlogo's are some sort of crazy java-based object and calls like:

set(handle, 'xticklabel', num2str(find(high_bit_cols)))

don't work. Any help would be greatly appreciated.

Thanks, Will

EDIT:

On the bounty I'm willing to accept any kind of crazy method for changing the axis labels include (but not limited to): Using the image-processing toolbox to modify the image after saving, creating a new seqlogo function using textboxes, modifying the java-code (if possible), etc. I'm NOT willing to accept things like "use python", "use this R library" or any other sort of non-Matlab solution.

+1  A: 

About the x-axis, it seems that the figure contains no standard axis (findobj(handle,'type','axes') is empty), rather a custom object of class com.mathworks.toolbox.bioinfo.sequence.SequenceLogo ...

On an unrelated note, you can replace your first line with a simpler call to:

wide_seqs = reshape(randseq(200*500),[],200);
Amro
helpful but doesn't really answer the question.
JudoWill
A: 

If the axes are a java object, then you may want to have a look at its methods and properties with uiinspect. This might give you an idea what you should edit to get the behavior you want (unfortunately, I don't have the toolbox, so I can't look it up for you).

Jonas
I took a look at the object with uiinspect ... The java object is very sparse and it doesn't seem to have anything related to the axis. I've submitted a support ticket to Mathworks, maybe I'lly get lucky.
JudoWill
+3  A: 

OK, I killed a few hours with this problem. It appears that you cannot place any MATLAB object (axes or textbox) on the top of that hgjavacomponent object. And I couldn't modified the java code, of course. So the only feasible solution I found is to create the figure from scratch.

I didn't want to rewrite the code to calculate weight matrices (symbols heights), you already did that. But it can be done, if you don't want to use MATLAB's seqlogo at all. So I've changed your last line a little to get the matrix:

[wm, handle] = seqlogo(wide_seqs(:,high_bit_cols ));

The problem with text symbols is that you cannot exactly control their size, cannot fit the symbol to textbox. This is probably why MATLAB decided to go with java graphic object. But we can create symbols images and deal with them.

Here is code to create images of letters:

letters = wm{1};
clr = [0 1 0; 0 0 1; 1 0.8 0;1 0 0]; % corresponding colors
for t = 1:numel(letters)
    hf = figure('position',[200 200 100 110],'color','w');
    ha = axes('parent',hf, 'visible','off','position',[0 0 1 1]);
    ht = text(50,55,letters(t),'color',clr(t,:),'units','pixels',...
        'fontsize',100,'fontweight','norm',...
        'vertical','mid','horizontal','center');
    F = getframe(hf); % rasterize the letter
    img = F.cdata;
    m = any(img < 255,3); % convert to binary image
    m(any(m,2),any(m,1))=1; % mask to cut white borders
    imwrite(reshape(img(repmat(m,[1 1 3])),[sum(any(m,2)) sum(any(m,1)) 3]),...
        [letters(t) '.png'])
    close(hf)
end

Then we use those images to draw new seqlogo plot:

xlabels = cellstr(num2str(find(high_bit_cols)'));
letters = wm{1};
wmat=wm{2}; % weight matrix from seqlogo
[nletters  npos] = size(wmat);
wmat(wmat<0) = 0; % cut negative values

% prepare the figure
clf
hAx = axes('parent',gcf,'visible','on');
set(hAx,'XLim',[0.5 npos+0.5],'XTick',1:npos,'XTickLabel',xlabels)
ymax = ceil(max(sum(wmat)));
ylim([0 ymax])
axpos = get(hAx,'Position');
step = axpos(3)/npos;

% place images of letters
for i=1:npos
    [wms idx] = sort(wmat(:,i)); % largest on the top
    let_show = letters(idx);
    ybot = axpos(2);
    for s=1:nletters
        if wms(s)==0, continue, end;
        axes('position',[axpos(1) ybot step wms(s)/ymax*axpos(4)])
        ybot = ybot + wms(s)/ymax*axpos(4);
        img = imread([let_show(s) '.png']);
        image(img)
        set(gca,'visible','off')
    end
    axpos(1)=axpos(1)+step;
end

Here is the result: alt text

The code and figure can be further improved, of course, but I hope this is something you can start working with. Let me know if I miss something.

yuk
looks great ... I was playing around with a method to use textboxes but had the some problem I'm sure u noticed, you can't for a letter to occupy the whole box. I'll play around with it for a few minutes to make sure it works ... I may play around with it to remove the temporary writing of the letters, but that's just personal preference. GREAT JOB
JudoWill
Making images from text and then placing them: Sweet. +1 I'm almost sorry that I like gnovices approach even better.
Jonas
+3  A: 

I ran into the same problems yuk did trying to modify the figure from SEQLOGO, so here's my attempt at my own version to mimic its appearance. It's a function seqlogo_new.m that you give two arguments: your sequence and an optional minimum bit value. It requires an image file ACGT.jpg that can be found at this link.

Here's the code for the function:

function hFigure = seqlogo_new(S,minBits)
%# SEQLOGO_NEW   Displays sequence logos for DNA.
%#   HFIGURE = SEQLOGO_NEW(SEQS,MINBITS) displays the
%#   sequence logo for a set of aligned sequences SEQS,
%#   showing only those columns containing at least one
%#   nucleotide with a minimum bit value MINBITS. The
%#   MINBITS parameter is optional. SEQLOGO_NEW returns
%#   a handle to the rendered figure HFIGURE.
%#
%#   SEQLOGO_NEW calls SEQLOGO to perform some of the
%#   computations, so to use this function you will need
%#   access to the Bioinformatics Toolbox.
%#
%#   See also seqlogo.

%# Author: Ken Eaton
%# Version: MATLAB R2009a
%# Last modified: 3/30/10
%#---------------------------------------------------------

  %# Get the weight matrix from SEQLOGO:

  W = seqlogo(S,'DisplayLogo',false);
  bitValues = W{2};

  %# Select columns with a minimum bit value:

  if nargin > 1
    highBitCols = any(bitValues > minBits,1);  %# Plot only high-bit columns
    bitValues = bitValues(:,highBitCols);
  else
    highBitCols = true(1,size(bitValues,2));   %# Plot all columns
  end

  %# Sort the bit value data:

  [bitValues,charIndex] = sort(bitValues,'descend');  %# Sort the columns
  nSequence = size(bitValues,2);                      %# Number of sequences
  maxBits = ceil(max(bitValues(:)));                  %# Upper plot limit

  %# Break 4-letter image into a 1-by-4 cell array of images:

  imgACGT = imread('ACGT.jpg');                %# Image of 4 letters
  [nRows,nCols,nPages] = size(imgACGT);        %# Raw image size
  letterIndex = round(linspace(1,nCols+1,5));  %# Indices of letter tile edges
  letterImages = {imgACGT(:,letterIndex(1):letterIndex(2)-1,:), ...
                  imgACGT(:,letterIndex(2):letterIndex(3)-1,:), ...
                  imgACGT(:,letterIndex(3):letterIndex(4)-1,:), ...
                  imgACGT(:,letterIndex(4):letterIndex(5)-1,:)};

  %# Create the image texture map:

  blankImage = repmat(uint8(255),[nRows round(nCols/4) 3]);  %# White image
  fullImage = repmat({blankImage},4,2*nSequence-1);  %# Cell array of images
  fullImage(:,1:2:end) = letterImages(charIndex);    %# Add letter images
  fullImage = cat(1,cat(2,fullImage{1,:}),...        %# Collapse cell array
                    cat(2,fullImage{2,:}),...        %#   to one 3-D image
                    cat(2,fullImage{3,:}),...
                    cat(2,fullImage{4,:}));

  %# Initialize coordinates for the texture-mapped surface:

  X = [(1:nSequence)-0.375; (1:nSequence)+0.375];
  X = repmat(X(:)',5,1);     %'# Surface x coordinates
  Y = [zeros(1,nSequence); cumsum(flipud(bitValues))];
  Y = kron(flipud(Y),[1 1]);  %# Surface y coordinates
  Z = zeros(5,2*nSequence);   %# Surface z coordinates

  %# Render the figure:

  figureSize = [602 402];                   %# Figure size
  screenSize = get(0,'ScreenSize');         %# Screen size
  offset = (screenSize(3:4)-figureSize)/2;  %# Offset to center figure
  hFigure = figure('Units','pixels',...
                   'Position',[offset figureSize],...
                   'Color',[1 1 1],...
                   'Name','Sequence Logo',...
                   'NumberTitle','off');
  axes('Parent',hFigure,...
       'Units','pixels',...
       'Position',[60 100 450 245],...
       'FontWeight','bold',...
       'LineWidth',3,...
       'TickDir','out',...
       'XLim',[0.5 nSequence+0.5],...
       'XTick',1:nSequence,...
       'XTickLabel',num2str(find(highBitCols)'),...  %'
       'YLim',[-0.03 maxBits],...
       'YTick',0:maxBits);
  xlabel('Sequence Position');
  ylabel('Bits');
  surface(X,Y,Z,fullImage,...
          'FaceColor','texturemap',...
          'EdgeColor','none');
  view(2);

end

And here are a few examples of its usage:

S = ['ATTATAGCAAACTA'; ...  %# Sample data
     'AACATGCCAAAGTA'; ...
     'ATCATGCAAAAGGA'];
seqlogo_new(S);             %# A normal plot similar to SEQLOGO

alt text

seqlogo_new(S,1);        %# Plot only columns with bits > 1

alt text

gnovice
the nice thing about this answer is that you can actually use this as a "normal" figure ... you can use it as a subplot or put it into an arbitrary axes object. With yuk's answer, since he places axes objects manually it can't be resized.
JudoWill
@JudoWill: I took a non-conventional approach by making all the displayed letters part of one large image that is texture-mapped onto a surface to stretch them as needed. This allowed me to have only 1 set of axes, but it's probably a little hard to follow what the code is doing. ;) Another approach (kind of a middle ground between yuk's answer and mine) would be to plot each letter as its own image object in one set of axes, which *may* be more graphically efficient than what I do above.
gnovice
I like this one even more than yuk's. +1
Jonas
+1  A: 

So I've created another solution using pieces of both yuk and gnovice's solution. As I played around with the solution I realized I would really like to be able to use the output as "subplots" and be able to change the color of letters arbitrarily.

Since yuk used programitically placed axes objects with the letter embedded it would have been very annoying (although not impossible) to modify his code to plot into an arbitrary axes object. Since gnovice's solution read the letters from a pre-created file it would have been difficult to modify the code to run against arbitrary color schemes or letter choices. So my solution uses the "letter generation" code from yuk's solution and the "image superimposing" method from gnovice's solution.

There is also a significant amount of argument parsing and checking. Below is my combined solution ... I'm including it only for completeness, I obviously can't win my own bounty. I'll let the community decide the award and give the bounty to whoever has the highest rating at the end of the time-limit ... in the event of a tie I'll give it to the person with the lowest rep (they probably "need" it more).

function [npos, handle] = SeqLogoFig(SEQ, varargin)
%   SeqLogoFig
%       A function which wraps around the bioinformatics SeqLogo command
%       and creates a figure which is actually a MATLAB figure.  All
%       agruements for SEQLOGO are passed along to the seqlogo calculation.
%       It also supports extra arguements for plotting.
%
%   [npos, handle] = SeqLogoFig(SEQ);
%
%       SEQ             A multialigned set of sequences that is acceptable
%                       to SEQLOGO.
%       npos            The positions that were actually plotted
%       handle          An axis handle to the object that was plotted.
%
%   Extra Arguements:
%       
%       'CUTOFF'        A bit-cutoff to use for deciding which columns to
%                       plot.  Any columns that have a MAX value which is
%                       greater than CUTOFF will be provided.  Defaults to
%                       1.25 for NT and 2.25 for AA.
%
%       'TOP-N'         Plots only the top N columns as ranked by thier MAX
%                       bit conservation.
%
%       'AXES_HANDLE'   An axis handle to plot the seqlogo into.
%       
%       'INDS'          A set of indices to to plot.  This overrides any
%                       CUTOFF or TOP-N that were provided
%
%
%
%

%% Parse the input arguements
ALPHA = 'nt';
MAX_BITS = 2.5;
RES = [200 80];
CUTOFF = [];
TOPN = [];
rm_inds = [];
colors = [];
handle = [];
npos = [];


for i = 1:2:length(varargin)
    if strcmpi(varargin{i}, 'alphabet')
        ALPHA = varargin{i+1};

    elseif strcmpi(varargin{i}, 'cutoff')
        CUTOFF = varargin{i+1};
        %we need to remove these so seqlogo doesn't get confused
        rm_inds = [rm_inds i, i+1]; %#ok<*AGROW>

    elseif strcmpi(varargin{i}, 'colors')
        colors = varargin{i+1};
        rm_inds = [rm_inds i, i+1]; 
    elseif strcmpi(varargin{i}, 'axes_handle')
        handle = varargin{i+1};
        rm_inds = [rm_inds i, i+1]; 
    elseif strcmpi(varargin{i}, 'top-n')
        TOPN = varargin{i+1};
        rm_inds = [rm_inds i, i+1];
    elseif strcmpi(varargin{i}, 'inds')
        npos = varargin{i+1};
        rm_inds = [rm_inds i, i+1];
    end
end

if ~isempty(rm_inds)
    varargin(rm_inds) = [];
end

if isempty(colors)
    colors = GetColors(ALPHA);
end

if strcmpi(ALPHA, 'nt')
    MAX_BITS = 2.5;
elseif strcmpi(ALPHA, 'aa')
    MAX_BITS = 4.5;
end

if isempty(CUTOFF)
    CUTOFF = 0.5*MAX_BITS;
end


%% Calculate the actual seqlogo.
wm = seqlogo(SEQ, varargin{:}, 'displaylogo', false);


%% Generate the letters
letters = wm{1};
letter_wins = cell(size(letters));
[~, loc] = ismember(letters, colors(:,1));
loc(loc == 0) = size(colors,1);
clr = cell2mat(colors(loc, 2)); % corresponding colors
for t = 1:numel(letters)
    hf = figure('position',[200 200 100 110],'color','w');
    ha = axes('parent',hf, 'visible','off','position',[0 0 1 1]);
    ht = text(50,55,letters(t),'color',clr(t,:),'units','pixels',...
        'fontsize',100,'fontweight','norm',...
        'vertical','mid','horizontal','center');
    F = getframe(hf); % rasterize the letter
    img = F.cdata;
    m = any(img < 255,3); % convert to binary image
    m(any(m,2),any(m,1))=1; % mask to cut white borders
    letter_wins{t} = reshape(img(repmat(m,[1 1 3])),[sum(any(m,2)) sum(any(m,1)) 3]);
    close(hf);
end


%% Use the letters to generate a figure

%create a "image" that will hold the final data
wmat = wm{2};
if isempty(npos)
    if isempty(TOPN)
        npos = find(any(wmat>CUTOFF,1));
    else
        [~, i] = sort(max(wmat,[],1), 'descend');
        npos = sort(i(1:TOPN));
    end
end

fig_data = 255*ones(RES(1), RES(2)*(length(npos)+1)+length(npos)*2,3);
bitscores = linspace(0, MAX_BITS, size(fig_data,1));
tick_pos = zeros(length(npos),1);
% place images of letters
for i=1:length(npos)
    [wms idx] = sort(wmat(:,npos(i)), 'descend'); % largest on the top
    bits = [flipud(cumsum(flipud(wms))); 0];
    let_data = letter_wins(idx(wms>0));
    for s=1:length(let_data)
        start_pos = find(bitscores>=bits(s),1);
        end_pos = find(bitscores<=bits(s+1),1, 'last');
        if isempty(start_pos) || isempty(end_pos) || end_pos > start_pos
            continue
        end
        img_win = imresize(let_data{s}, [start_pos-end_pos, RES(2)]);

        fig_data(start_pos-1:-1:end_pos, (i*RES(2)-RES(2)*.5:i*RES(2)+RES(2)*.5-1)+2*i,:) = img_win;
    end
    tick_pos(i) = i*RES(2)+2*i;
end
if ~isempty(handle)
    image(handle,[0 size(fig_data,2)], [0 MAX_BITS],fig_data./255)
else
    handle = image([0 size(fig_data,2)], [0 MAX_BITS],fig_data./255);
end
set(gca, 'ydir', 'normal', 'xtick', tick_pos, ...
        'userdata', tick_pos, 'xticklabel', npos);
xlabel('position')
ylabel('bits')


function colors = GetColors(alpha)
% get the standard colors for the sequence logo
if strcmpi(alpha, 'nt')
    colors = cell(6,2);
    colors(1,:) = {'A', [0 1 0]};
    colors(2,:) = {'C', [0 0 1]};
    colors(3,:) = {'G', [1 1 0]};
    colors(4,:) = {'T', [1 0 0]};
    colors(5,:) = {'U', [1 0 0]};
    colors(6,:) = {'', [1 0 1]};
elseif strcmpi(alpha, 'aa')
    colors = cell(21,2);
    colors(1,:) = {'G', [0 1 0]};
    colors(2,:) = {'S', [0 1 0]};
    colors(3,:) = {'T', [0 1 0]};
    colors(4,:) = {'Y', [0 1 0]};
    colors(5,:) = {'C', [0 1 0]};
    colors(6,:) = {'Q', [0 1 0]};
    colors(7,:) = {'N', [0 1 0]};
    colors(8,:) = {'A', [1 165/255 0]};
    colors(9,:) = {'V', [1 165/255 0]};
    colors(10,:) = {'L', [1 165/255 0]};
    colors(11,:) = {'I', [1 165/255 0]};
    colors(12,:) = {'P', [1 165/255 0]};
    colors(13,:) = {'W', [1 165/255 0]};
    colors(14,:) = {'F', [1 165/255 0]};
    colors(15,:) = {'M', [1 165/255 0]};
    colors(16,:) = {'D', [1 0 0]};
    colors(17,:) = {'E', [1 0 0]};
    colors(18,:) = {'K', [0 0 1]};
    colors(19,:) = {'R', [0 0 1]};
    colors(20,:) = {'H', [0 0 1]};
    colors(21,:) = {'', [210/255 180/255 140/255]};
else
    error('SeqLogoFigure:BADALPHA', ...
            'An unknown alphabet was provided: %s', alpha)
end

I've submitted this to the Mathworks FileExchange ... when its approved I'll post a link.

The only nagging annoyance I have is that as it creates the letter images it displays little figure windows in rapid speed. If anyone knows a trick that could get avoid that I'd love to hear it.

EDIT: Mathworks has approved my submitted file ... you can download it at the FileExchange here: http://www.mathworks.com/matlabcentral/fileexchange/27124

JudoWill
@JudoWill: Very nice. With respect to keeping those little figure windows from appearing, I think you can just set the `'Visible'` property of the figure windows to `'off'` when you create them.
gnovice
@gnovice: If I recall correctly, getframe requires the figure to be visible. @JudoWill: It may be a good idea to save the letters to disk, so that the figures flash only the first time the particular letters are being created.
Jonas
@Jonas: You may be right. I think they need to state that clearly in the documentation, since I only saw mentions of virtual desktops that aren't visible as opposed to invisible figures.
gnovice
@gnovice: I just tested it - getframe indeed needs visible figures. If they're invisible, getframe sets 'visible' to 'on', which won't help JudoWill's problem.
Jonas
I'll easily live with a few flashing axes if I can have a seqlogo as a real figure. This has been a minor annoyance for ~2 years and I finally got to a project that absolutely needed seqlogo to be a real figure.
JudoWill
@Jonas: You're right. I was mistaking the behavior of GETFRAME with the behavior of PRINT. The PRINT function can save a figure to a file even when the figure has its visibility turned off.
gnovice