tags:

views:

39

answers:

1

Hi all,

I'm trying to extract functions and function headers from some source code files. Here's an example of the type of code:

################################################################################
# test module
#
# Description : Test module
#
DATABASE test

###
# Global Vars
GLOBALS
    DEFINE G_test_string    STRING
END GLOBALS

###
# Modular Vars
DEFINE M_counter            INTEGER

###
# Constants
CONSTANT MAX_ARR_SIZE = 100

##################################
# Alternative header
##################################
FUNCTION test_function_1()
    DEFINE  F_x     INTEGER

    LET F_x = 1

    RETURN F_x
END FUNCTION

###################################
# Function:
#   This is a test function
#
# Parameters:
#   in - test
#
# Returns:
#   out - result
#
FUNCTION test_function_2( P_in_var )
    DEFINE  P_in_var    INTEGER

    DEFINE  F_out_var   INTEGER


    LET F_out_var = P_in_var

    RETURN F_out_var
END FUNCTION

FUNCTION test_init_array()
    DEFINE  F_array     ARRAY[ MAX_ARR_SIZE ] OF INTEGER
    DEFINE  F_element   INTEGER

    FOR F_element = 1 TO MAX_ARR_SIZE

        LET F_array[ F_element ] = F_element * F_element

    END FOR

END FUNCTION

Functions may or may not have a header above them. I'm trying to capture the function source, function header, function name and any parameters passed into the function in groups. Here's the expression i came up with (i'm doing this using .Net regex and have been testing using Regex Hero):

^([#]{0,1}.*?)(FUNCTION\s+(.*?)[(](.*?)[)].*?END FUNCTION) 

This seems to work ok for all but the first function (test_function_1) in the file. The initial grouping for test_function_1 is capturing everything from the first line (the top of the source file) until the FUNCTION of test_function_1 begins. I realise this is because there are #s for other comments in the file, but i only want to capture the function header.

+1  A: 

If I see it correctly, you have problems identifying lines starting with #. To achieve this, you could turn on the RegexOptions.Multiline flag and match the function header with

((?:^#.*\s)*)

Edit: For this to work, you'd have to switch OFF RegexOptions.Singleline and replace .*? with [\s\S]*? in your function body part.

Jens
Brilliant. Thanks for that.
llihp