For a simple algorithm, let's assume that you can compute a bounding box around each word, and that you have an image with a mask of the shape that you want to fill.
Sweep down from the top of the image mask until you find a line that is as long as the first word. See if you can extend it downward into a rectangle the size of the bounding box. If so, drop the first word there. If not, keep sweeping.
Once you drop a word, see if you can extend the bounding box to be the width of (first box + second box + space) and the height of max(first box,second box). If so, drop the second word there. If not, center the first word left-to-right within the bounding box that will fit within your image mask (left to right), remove that bounding box from the mask, and keep going.
You can make this slightly fancier by insisting that lines have the same baseline even if broken by the shape (e.g. lines across the top nubs of the heart); you then need to have an alternate "continue along this baseline" condition. But the basic idea above, with an image mask that you use to try to fit rectangles inside which are removed when you're done, will do the job.
(It is faster to use geometric operations than pixel-based ones described here, but then one has to worry about all the cases for figuring out how a bounding box fits within an arbitrary polygon, and that's a bit long to explain here.)