Split String At Nth Occurrence Of A Given Character

September 30, 2023 Post a Comment

Is there a Python-way to split a string after the nth occurrence of a given delimiter? Given a string: '20_231_myString_234' It should be split into (with the delimiter being '_',

Solution 1:

>>> n = 2>>> groups = text.split('_')>>> '_'.join(groups[:n]), '_'.join(groups[n:])
('20_231', 'myString_234')

Seems like this is the most readable way, the alternative is regex)

Solution 2:

Using re to get a regex of the form ^((?:[^_]*_){n-1}[^_]*)_(.*) where n is a variable:

n=2
s='20_231_myString_234'
m=re.match(r'^((?:[^_]*_){%d}[^_]*)_(.*)' % (n-1), s)
if m: print m.groups()

or have a nice function:

import re
defnthofchar(s, c, n):
    regex=r'^((?:[^%c]*%c){%d}[^%c]*)%c(.*)' % (c,c,n-1,c,c)
    l = ()
    m = re.match(regex, s)
    if m: l = m.groups()
    return l

s='20_231_myString_234'print nthofchar(s, '_', 2)

Or without regexes, using iterative find:

defnth_split(s, delim, n): 
    p, c = -1, 0while c < n:  
        p = s.index(delim, p + 1)
        c += 1return s[:p], s[p + 1:] 

s1, s2 = nth_split('20_231_myString_234', '_', 2)
print s1, ":", s2

Solution 3:

I like this solution because it works without any actuall regex and can easiely be adapted to another "nth" or delimiter.

import re

string = "20_231_myString_234"
occur = 2# on which occourence you want to split

indices = [x.start() for x in re.finditer("_", string)]
part1 = string[0:indices[occur-1]]
part2 = string[indices[occur-1]+1:]

print (part1, ' ', part2)

Solution 4:

I thought I would contribute my two cents. The second parameter to split() allows you to limit the split after a certain number of strings:

defsplit_at(s, delim, n):
    r = s.split(delim, n)[n]
    return s[:-len(r)-len(delim)], r

On my machine, the two good answers by @perreal, iterative find and regular expressions, actually measure 1.4 and 1.6 times slower (respectively) than this method.

It's worth noting that it can become even quicker if you don't need the initial bit. Then the code becomes:

defremove_head_parts(s, delim, n):
    return s.split(delim, n)[n]

Not so sure about the naming, I admit, but it does the job. Somewhat surprisingly, it is 2 times faster than iterative find and 3 times faster than regular expressions.

I put up my testing script online. You are welcome to review and comment.

Solution 5:

It depends what is your pattern for this split. Because if first two elements are always numbers for example, you may build regular expression and use re module. It is able to split your string as well.

Python Programming Language