Hi folks! I have a list of a few thousand terms. There is significant overlap in those terms, but in different forms. For example (ruby, a_ruby), (triathlon, triathlete, triathletes), (nonprofit, non_profit, non_profits).
Most of these have significant number of character overlap, but not exactly in the same form. For example, (nonprofit and non_profit)
What regex sequence will be the best for this? I know that i can use stemming as well, but wondering how i can combine that with the regex.