SQL help, parsing a field

AsSiMiLaTeD
AsSiMiLaTeD Posts: 11,725
edited December 2008 in The Clubhouse
I need some help parsing a field in SQL, hoping some here has done this before. I have a field separated by periods, and I need each section parsed into a separate field. The trick is that the number of characters in each section can differ, so I can't use the substring function. here's an example

Raw fields:

****.1234.123456
****.123456789.12345
****.12345.1234

would need to look like this when parsed.

field1 field2 field3
**** 1234 123456
**** 123456789 12345
**** 12345 1234

Anyone have anything that can do that Logic?
Post edited by AsSiMiLaTeD on

Comments

  • VSchneider
    VSchneider Posts: 443
    edited December 2008
    What platform is your DB server on? Sometimes it's much easier to do things outside of SQL.

    That being said, let me think for a few...

    Also, what's the database flavor? Is the string always a triplet?
  • AsSiMiLaTeD
    AsSiMiLaTeD Posts: 11,725
    edited December 2008
    SQL Server 2005. The field is always a triplet, but there are a couple other possibilities I should mention:

    null
    ****..
    ****.0.0
  • unc2701
    unc2701 Posts: 3,587
    edited December 2008
    I agree w/ Vschneider- for example SAS has a function for something like that, so I could just use this:
    scan(var,1,".") would grab the first field delimited by a period.

    W/ MS SQL Server, you could probably do it w/ a combination of CHARINDEX and SUBSTRING
    Gallo Ref 3.1 : Bryston 4b SST : Musical fidelity CD Pre : VPI HW-19
    Gallo Ref AV, Frankengallo Ref 3, LC60i : Bryston 9b SST : Meridian 565
    Jordan JX92s : MF X-T100 : Xray v8
    Backburner:Krell KAV-300i
  • VSchneider
    VSchneider Posts: 443
    edited December 2008
    Here's an example for Sybase (my platform of choice), by MSSQL 2005 will be similar:
    1> declare @str_length int,
    2>         @dot_index int,
    3>         @dot_char char(1),
    4>         @string varchar(255),
    5>         @field1 varchar(255)
    6>
    7> select @string = '****.123456789.12345', @dot_char = '.'
    8> print 'input string = "%1!", field separator = "%2!"', @string, @dot_char
    9>
    10> select @str_length = char_length(@string),
    11>        @dot_index = charindex(@dot_char,@string)
    12>
    13> print 'str_length = %1!, separator index = %2!', @str_length, @dot_index
    14>
    15> select @field1 = substring(@string, 1, @dot_index-1),
    16>        @string = substring(@string, @dot_index+1, @str_length)
    17>
    18> print 'field1 = "%1!", parsed string = "%2!"', @field1, @string
    19> go
    input string = "****.123456789.12345", field separator = "."
    str_length = 19, separator index = 4
    field1 = "****", parsed string = "123456789.12345"
    1>
    

    Check the print statements in the code and the results. It works for ".123456789.12345" also, returning "" and "123456789.12345".

    - Val
  • MKZ
    MKZ Posts: 1,068
    edited December 2008
    Did you try LTRIM,RTRIM?
  • VSchneider
    VSchneider Posts: 443
    edited December 2008
    LTRIM/RTRIM only remove leading or trailing spaces, AFIK

    EDIT: BTW, I now have a looping version of that routine with some error checking to take care of the empty string case, too, but I think you got the idea.

    If the data is simply a column in a MSSQL table, you can write a cursor to process the entire table. That won't be pretty but will get the job done.

    - Val
  • unc2701
    unc2701 Posts: 3,587
    edited December 2008
    Ok.
    Field1 would be:
    SUBSTRING(RAW,1,CHARINDEX(".",RAW,1)-1)

    Temp1 would be:
    SUBSTRING(RAW,CHARINDEX(".",RAW,1)+1)

    Field2 would be:
    SUBSTRING(Temp1,1,CHARINDEX(".",Temp1,1)-1)

    Temp2:
    SUBSTRING(Temp1,CHARINDEX(".",Temp1,1)+1)

    Field3 would be:
    SUBSTRING(Temp2,1,CHARINDEX(".",Temp2,1)-1)

    You can nest them to get rid of the temp vars. I'm assuming that if length is left off SUBSTRING it'll take the rest of the string from the starting point (not sure if that's true on your platform)... and you'll have to deal w/ your special cases.
    Gallo Ref 3.1 : Bryston 4b SST : Musical fidelity CD Pre : VPI HW-19
    Gallo Ref AV, Frankengallo Ref 3, LC60i : Bryston 9b SST : Meridian 565
    Jordan JX92s : MF X-T100 : Xray v8
    Backburner:Krell KAV-300i
  • VSchneider
    VSchneider Posts: 443
    edited December 2008
    errr... That's exactly what the code above does with some print statements to show how it is working.
  • MKZ
    MKZ Posts: 1,068
    edited December 2008
    My bad :o

    Please try this and see if it works.

    CREATE TABLE #Table1 (Col1 varchar(30))

    INSERT INTO #Table1 (Col1) Values ('****.1234.123456')

    INSERT INTO #Table1 (Col1) Values ('****.123456789.12345')

    INSERT INTO #Table1 (Col1) Values ('****.12345.1234')

    select

    rtrim(replace(parsename(replace(Col1, '.', ' '), 1), '.', ' ')) as col1

    from

    #Table1

    DROP TABLE #Table1
  • VSchneider
    VSchneider Posts: 443
    edited December 2008
    Cool, this compiles in MS SQL 2005, and returns the original string with spaces in place of dots. But how about separating the Col1 into individual fields?
  • unc2701
    unc2701 Posts: 3,587
    edited December 2008
    VSchneider wrote: »
    errr... That's exactly what the code above does with some print statements to show how it is working.

    heh. oops, I totally didn't read that mess of code. Looks like you do have to give substring a length.
    Gallo Ref 3.1 : Bryston 4b SST : Musical fidelity CD Pre : VPI HW-19
    Gallo Ref AV, Frankengallo Ref 3, LC60i : Bryston 9b SST : Meridian 565
    Jordan JX92s : MF X-T100 : Xray v8
    Backburner:Krell KAV-300i
  • MKZ
    MKZ Posts: 1,068
    edited December 2008
    VSchneider wrote: »
    Cool, this compiles in MS SQL 2005, and returns the original string with spaces in place of dots. But how about separating the Col1 into individual fields?

    That means rtrim doesn't work in 2005 I think. I do not have 2005 to test with. There might be some other way.
  • AsSiMiLaTeD
    AsSiMiLaTeD Posts: 11,725
    edited December 2008
    I'll take a look at you guys' posts and report back, thanks
  • Sami
    Sami Posts: 4,634
    edited December 2008
    Just to show why you should consider doing the work in other than SQL environment...here's C# code that takes in a string and separates it into substrings based on regular expression, in this case ".". Live's a little easier when you have more powerful tools to work with than just SQL.
    using System.Text.RegularExpressions;
    
    String[] GetFieldsSeparatedByPeriod(String Input)
    {
       Regex re = new Regex(".");
       return re.Split(Input);
    }
    
  • MKZ
    MKZ Posts: 1,068
    edited December 2008
    Saw it online and I thought you might get an idea.

    declare @string varchar(500)
    set @string = '****.12345.1234'


    declare @pos int
    declare @piece varchar(500)

    -- Need to tack a delimiter onto the end of the input string if one doesn't exist
    if right(rtrim(@string),1) <> '.'
    set @string = @string + '.'

    set @pos = patindex('%.%' , @string)
    while @pos <> 0
    begin
    set @piece = left(@string, @pos - 1)

    -- You have a piece of data, so insert it, print it, do whatever you want to with it.
    print cast(@piece as varchar(500))

    set @string = stuff(@string, 1, @pos, '')
    set @pos = patindex('%.%' , @string)
    end
  • Kris Siegel
    Kris Siegel Posts: 309
    edited December 2008
    I would recommend doing the parsing in something other than SQL as this doesn't lend itself very well to SQL. Perhaps C#?
    Sami wrote: »
    Just to show why you should consider doing the work in other than SQL environment...here's C# code that takes in a string and separates it into substrings based on regular expression, in this case ".". Live's a little easier when you have more powerful tools to work with than just SQL.
    using System.Text.RegularExpressions;
    
    String[] GetFieldsSeparatedByPeriod(String Input)
    {
       Regex re = new Regex(".");
       return re.Split(Input);
    }
    
    This actually won't work. In Regular Expressions the . matches all characters so you'll end up with an array with an individual character in each string instance (so roughly 15 fields in my test of "XXXX.XXXX.XXXX").

    You could stick with Regular Expressions and do this:
    using System.Text.RegularExpressions;
    
    String[] GetFieldsSeparatedByPeriod(String Input)
    {
       Regex re = new Regex("\\.");
       return re.Split(Input);
    }
    
    or go the easy route:
    string[] Str = MyString.Split('.');
    

    Not sure which method is faster as I believe the Split member of String uses Regular Expressions but I'm not 100% sure.
  • Sami
    Sami Posts: 4,634
    edited December 2008
    Kris, you're correct about period, it is one of the metacharacters. The reason I used Regular Expressions is because it can do much more than just splitting. There's really only one extra line to use versus the string method so it's not anymore complicated.
    Regex re = new Regex("\\.");
    String[] str = re.Split(MyString);