views:

141

answers:

3

I want to extract a word from a string column of a table.

description
===========================
abc order_id: 2 xxxx yyy aa
mmm order_id: 3 nn kk yw

Expected result set

order_id
===========================
2
3

Table will at most have 100 rows, text length is ~256 char and column always has one order_id present. So performance is not an issue.

In Oracle, I can use REGEXP_SUBSTR for this problem. How would I solve this in MySQL?

Edit 1

I am using LOCATE and SUBSTR to solve the problem. The code is ugly. Ten minutes after writing the code, I am cursing the guy who wrote such an ugly code.

I didn't find the REGEXP_SUBSTR function in MySQL docs. But I am hoping that it exists..

Answer to : Why cant the table be optimized? Why is the data stored in such a dumb fashion?

The example I gave just denotes the problem I am trying to solve. In real scenario, I am using a DB based 3rd party queuing software for executing asynchronous tasks. The queue serializes the Ruby object as text. I have no control over the table structure OR the data format. The tasks in the queue can be recurring. In our test setup, some of the recurring tasks are failing because of stale data. I have to delete these tasks to prevent the error. Such errors are not common, hence I don't want to maintain a normalized shadow table.

+1  A: 

There is no MySQL equivalent. The MySQL REGEXP can be used for matching strings, but not for transforming them.

You can either try to work with stored procedures and a lot of REPLACE/SUBSTRING logic, or do it in your programming language - which should be the easiest option.

But are you sure your data format is well chosen? If you need the order_id, wouldn't it make sense to store it in a different column, so you can put indexes, use joins and the likes?

Konerak
We can't always choose our data format. For instance, when undertaking data migration or importing stuff from another system we frequently have to handle whatever we're given.
APC
Read my updated question for reason why I have such a data.
KandadaBoggu
+2  A: 

Like Konerak said, there is no equivalent of REGEXP_SUBSTR in MySql. You could do what you need using SUBSTRING logic, but it is ugly :

SELECT
  SUBSTRING(lastPart.end, 1, LOCATE(' ', lastPart.end) - 1) AS orderId
FROM
  (
    SELECT
      SUBSTRING(dataset.description, LOCATE('order_id: ', dataset.description) + LENGTH('order_id: ')) AS end
    FROM
      (
        SELECT 'abc order_id: 2 xxxx yyy aa' AS description
        UNION SELECT 'mmm order_id: 3 nn kk yw' AS description
        UNION SELECT 'mmm order_id: 1523 nn kk yw' AS description
      ) AS dataset
    ) AS lastPart

Edit: You could try this user defined function providing access to perl regex in MySql

SELECT 
  PREG_CAPTURE( '/.*order_id:\s(\d+).*/', dataset.description,1)
FROM
  (
    SELECT 'abc order_id: 2 xxxx yyy aa' AS description
    UNION SELECT 'mmm order_id: 3 nn kk yw' AS description
    UNION SELECT 'mmm order_id: 1523 nn kk yw' AS description
  ) AS dataset
madgnome
This is what I am doing. +1 for taking time to write this..
KandadaBoggu
A: 

There is always cure for code ugliness: a dtabase normalization.
if you need different parts of the matter, these parts must be stored se-pa-ra-te-ly. Period.

What prevents you from parsing the data before inserting?

Anyway, I am sure it is cause to be cured, not consequence.

Col. Shrapnel
Updated my question with additional details about the scenario.
KandadaBoggu