ansaurus

Question

Problems title-casing a string in Python

Answer 1

+5 A:

Strings are immutable. They can't be changed. You must create a new string with the changed content. If you want to make every 'j' uppercase:

def make_uppercase_j(char):
    if char == 'j':
        return 'J'
    else:
        return char
name = "markus johansson"
''.join(make_uppercase_j(c) for c in name)

unbeknown 2009-03-04 09:31:16

''.join('J' if c == 'j' else c for c in name)

J.F. Sebastian 2009-03-04 13:10:21

Answer 2

+8 A:

I guess that what you're trying to achieve is:

from string import capwords
capwords(name)

Which yields:

'Markus Johansson'

EDIT: OK, I see you want to tear down a open door. Here's low level implementation.

''.join([char.upper() if prev==' ' else char for char,prev in zip(name,' '+name)])

vartec 2009-03-04 09:31:19

Dude, he wants "real programming not wannabe." :)

Tyson 2009-03-04 09:33:46

Yeah.. Real programmers use integers ;-)

vartec 2009-03-04 09:35:41

All wrong, real programmers use butterflies. Proof: http://xkcd.com/378/

Georg 2009-03-04 09:37:09

Real programmers genetically engineer an army of butterflies for parallel writes to the hard drive.

Tyson 2009-03-04 09:42:28

The OP asks for lower level implementation without `str.title` or `string.capwords`.

J.F. Sebastian 2009-03-04 13:06:08

Answer 3

+1 A:

If I understand your original algorithm correctly, this is what you want to do:

namn = list("markus johansson")

if namn[0] == 'm':
    namn[0] = "M"

count = 0

for i in range(1, len(namn)):
    if namn[i] == " ":
        count = i + 1
    if count and namn[count] == 'j':    
        namn[count] = 'J'

print ''.join(namn)

Of course, there's a million better ways ("wannabe" ways) to do what you're trying to do, like as shown in vartec's answer. :)

As it stands, your code only works for names that start with a J and an M for the first and last names, respectively.

Tyson 2009-03-04 09:37:03

Answer 4

+7 A:

>>> "markus johansson".title()
'Markus Johansson'

Built in string methods are the way to go.

EDIT: I see you want to re-invent the wheel. Any particular reason ? You can choose from any number of convoluted methods like:

' '.join(j[0].upper()+j[1:] for j in "markus johansson".split())

Standard Libraries are still the way to go.

Christian Witts 2009-03-04 10:55:15

Yeah, I didnt know title, great tip. standard library rulez again.

Jiri 2009-03-04 11:12:13

`j[1:]` -> `j[1:].lower()` or just `j.capitalize()`. `str.split()` drops all spaces therefore `' '.join(...)` does different thing compared to `str.title()`.

J.F. Sebastian 2009-03-04 13:03:17

This is the best and simplest method.

Selinap 2009-03-04 13:19:58

Answer 5

A:

"real programming"?

I would use .title(), and I'm a real programmer.

Or I would use regular expressions

re.sub(r"(^|\s)[a-z]", lambda m: m.group(0).upper(), "this   is a set of  words")

This says "If the start of the text or a whitespace character is followed by a lower-case letter" (in English - other languages are likely not supported), then for each match convert the match text to upper-case. Since the match text is the space and the lower-case letter, this works just fine.

If you want it as low-level code then the following works. Here I only allow space as the separator (but you may want to support newline and other characters). On the other hand, "string.lowercase" is internationalized, so if you're in another locale then it will, for the most part, still work. If you don't want that then use string.ascii_lowercase.

import string

def title(s):
    # Capitalize the first character
    if s[:1] in string.lowercase:
        s = s[0].upper() + s[1:]

    # Find spaces
    offset = 0
    while 1:
        offset = s.find(" ", offset)
        # Reached the end of the string or the
        # last character is a space
        if offset == -1 or offset == len(s)-1:
            break

        if s[offset+1:offset+2] in string.lowercase:
            # Is it followed by a lower-case letter?
            s = s[:offset+1] + s[offset+1].upper() + s[offset+2:]
            # Skip the space and the letter
            offset += 2
        else:
            # Nope, so start searching for the next space
            offset += 1

    return s

To elaborate on my comment to this answer, this question can only be an exercise for curiosity's sake. Real names have special capitalization rules: the "van der" in "Johannes Diderik van der Waals" is never capitalized, "Farrah Fawcett-Majors" has the "M", and "Cathal Ó hEochaidh" uses the non-ASCII Ó and h, which modify "Eochaidh" to mean "grandson of Eochaidh".

Andrew Dalke 2009-03-04 11:07:10

It has poor performance for long inputs, say 50,000 words or more. That's because you are re-creating the string for each word. It probably has worse-than-linear big-O complexity. It would probably be more efficient to build the string in a StringIO object.

dangph 2009-03-04 11:51:37

BTW, an easy way to generate a long input for testing: s = 'test ' * 50000

dangph 2009-03-04 11:52:16

Yup, quadratic in the number of substitutions. But 1) this isn't meant as general purpose code 2) the OP wanted character-level code 3) 1000 substitutions is still < 1ms 4) who uses title for real? esp. since i18n, "quotes", etc. mess things up, 5) title on names like "van der Waal" is incorrect.

Andrew Dalke 2009-03-04 12:48:56

Oh, and yes, I was writing that code like a C programmer. ;)

Andrew Dalke 2009-03-04 13:06:00

Answer 6

+1 A:

Plenty of good suggestions, so I'll be in good company adding my own 2 cents :-)

I'm assuming you want something a little more generic that can handle more than just names starting with 'm' and 'j'. You'll probably also want to consider hyphenated names (like Markus Johnson-Smith) which have caps after the hyphen too.

from string import lowercase, uppercase
name = 'markus johnson-smith'

state = 0
title_name = []

for c in name:
    if c in lowercase and not state:
        c = uppercase[lowercase.index(c)]
        state = 1
    elif c in [' ', '-']:
        state = 0
    else:
        state = 1 # might already be uppercase

    title_name.append(c)

print ''.join(title_name)

Last caveat is the potential for non-ascii characters. Using the uppercase and lowercase properties of the string module is good in this case becase their contents change depending on the user's locale (ie: system-dependent, or when locale.setlocale() is called). I know you want to avoid using upper() for this exercise, and that's quite neat... as an FYI, upper() uses the locale controlled by setlocale() too, so the practice of use uppercase and lowercase is a good use of the API without getting too high-level. That said, if you need to handle, say, French names on a system running an English locale, you'll need a more robust implementation.

Jarret Hardie 2009-03-04 11:42:23

'markus johnson-smith'.title() -> 'Markus Johnson-Smith'

J.F. Sebastian 2009-03-04 12:41:45

`uppercase[lowercase.index(c)]` makes it O(N**2). Use `c.islower()`, `c.lower()`, etc.

J.F. Sebastian 2009-03-04 12:56:32

Totally agree... just trying to help him avoid the use of library functions as per his question.

Jarret Hardie 2009-03-04 13:32:51

Answer 7

+5 A:

`string.capwords()` (defined in `string.py`)

# Capitalize the words in a string, e.g. " aBc  dEf " -> "Abc Def".
def capwords(s, sep=None):
    """capwords(s, [sep]) -> string

    Split the argument into words using split, capitalize each
    word using capitalize, and join the capitalized words using
    join. Note that this replaces runs of whitespace characters by
    a single space.

    """
    return (sep or ' ').join(x.capitalize() for x in s.split(sep))

`str.title()` (defined in `stringobject.c`)

PyDoc_STRVAR(title__doc__,
"S.title() -> string\n\
\n\
Return a titlecased version of S, i.e. words start with uppercase\n\
characters, all remaining cased characters have lowercase.");
static PyObject*
string_title(PyStringObject *self)
{
    char *s = PyString_AS_STRING(self), *s_new;
    Py_ssize_t i, n = PyString_GET_SIZE(self);
    int previous_is_cased = 0;
    PyObject *newobj = PyString_FromStringAndSize(NULL, n);
    if (newobj == NULL)
     return NULL;
    s_new = PyString_AsString(newobj);
    for (i = 0; i < n; i++) {
     int c = Py_CHARMASK(*s++);
     if (islower(c)) {
      if (!previous_is_cased)
          c = toupper(c);
      previous_is_cased = 1;
     } else if (isupper(c)) {
      if (previous_is_cased)
          c = tolower(c);
      previous_is_cased = 1;
     } else
      previous_is_cased = 0;
     *s_new++ = c;
    }
    return newobj;
}

`str.title()` in pure Python

class String(str):
    def title(self):
        s = []
        previous_is_cased = False
        for c in self:
            if c.islower():
               if not previous_is_cased:
                  c = c.upper()
               previous_is_cased = True
            elif c.isupper():
               if previous_is_cased:
                  c = c.lower()
               previous_is_cased = True
            else:
               previous_is_cased = False
            s.append(c)
        return ''.join(s)

Example:

>>> s = ' aBc  dEf '
>>> import string
>>> string.capwords(s)
'Abc Def'
>>> s.title()
' Abc  Def '
>>> s
' aBc  dEf '
>>> String(s).title()
' Abc  Def '
>>> String(s).title() == s.title()
True

J.F. Sebastian 2009-03-04 12:08:36

Answer 8

+1 A:

If you're looking into more generic solution for names, you should also look at following examples:

John Adams-Smith
Joanne d'Arc
Jean-Luc de'Breu
Donatien Alphonse François de Sade

Also some parts of the names shouldn't start with capital letters, like:

Herbert von Locke
Sander van Dorn
Edwin van der Sad

so, if you're looking into creating a more generic solution, keep all those little things in mind.

(This would be a perfect place to run a test-driven development, with all those conditions your method/function must follow).

kender 2009-03-04 13:35:03

This isn't the place for test-driven development - this is the place to review your process and figure out why the names were all coming in as lowercase, and change it so that you don't need post-hoc code. Even with post-hoc code, make sure _any_ ambiguity is resolved by humans.

Andrew Dalke 2009-03-04 14:01:52

ansaurus

tags:

views:

answers:

Problems title-casing a string in Python

`string.capwords()` (defined in `string.py`)

`str.title()` (defined in `stringobject.c`)

`str.title()` in pure Python

related questions

ansaurus

tags:

views:

answers:

Problems title-casing a string in Python

string.capwords() (defined in string.py)

str.title() (defined in stringobject.c)

str.title() in pure Python

related questions

`string.capwords()` (defined in `string.py`)

`str.title()` (defined in `stringobject.c`)

`str.title()` in pure Python