views:

4473

answers:

20

In the spirit of

what are some common mistakes made by Scala developers, and how can we avoid them?

Also, as the biggest group of new Scala developers come from Java, what specific pitfalls they have to be aware of?

For example, one often cited problem Java programmers moving to Scala make is use a procedural approach when a functional one would be more suitable in Scala. What other mistakes e.g. in API design newcomers should try to avoid.

+24  A: 

From Implicit conversions - magical and demonic (emphasis mine):

In Scala == is structural equality test method, unlike in Java, where it is reference equality test. It takes one argument type of Any, which means you can compare your object against any object.

I have had problems with this simply, because you don't always compare things you should.

For example, I might have a tuple of Ints, which has name 'boom'.
And I have another tuple 'doom', and I'd like to compare their first elements, i.e.

 boom._1 == doom._1. 

But I might be a bit sleepy, and write

 boom == doom._1, 

i.e. comparing a tuple against an Int.
Compiler won't say that you are comparing completely different objects. It just sings logically incorrect bytecode. It might be very hard to notice that the comparison isn't ever going to give a true value just by testing it few times.

Ok, that's bad enough as it is, but when we mix implicit conversions with this magical soup, we are really doomed.
An example: String has no reverse method. Therefore it has been extended with RichString.
When you say "yay" reverse, you get a RichString, not String.
Guess what,

 "yay".reverse == "yay" 

won't give you true. Happy debugging times ahead, my friend.

Note: regarding that specific last example, ticket Scala 1495 says it should be fixed in Scala2.8.


Since this answers, the comments mention:

VonC
Oh man, that latter one is just a killer.
Lewisham
Seems like some compiler warnings (builtin or via plugin) could help with this one...
Travis
I recommend to avoid Any.== in favour of the type safe alternative from Scalaz. See the doc comment on scalaz.Equal: http://code.google.com/p/scalaz/source/browse/trunk/core/src/main/scala/scalaz/Equal.scala
retronym
@retronym: thank you, very interesting link.
VonC
In 2.8.0, this is true: "yay".reverse == "yay", reverse() returns a String - this seems to be a result of the new collection framework design.
Dimitris Andreou
@retronym: your link is (now?) invalid.
user unknown
@user: I have fixed the `Equal` link mentioned by retronym and included that comment in my answer.
VonC
+39  A: 

Keeping with return null idiom instead of using the Option trait to indicate that a method call may not return a value:

val p = productService.find(RIC, "MSFT.O")
if (p == null)
  //do something 
else
  //do something else

Should be replaced by:

productService.find(RIC, "MSFT.O") match {
  case None => //do something
  case Some(p) => //do something else
}

ie. the ProductService.find method should have the signature:

def find(cType: CodeType, code: String) : Option[Product]  

edit : NOTE one thing I've noticed is the unnecessary resolution (via code-forking) which is a common hangover from Java. For example, rather than the above I might do this instead:

productService.find(RIC, "MSFT.O").toRight(None).fold(doSomething, doSomethingElse)

Or one of the many functional alternatives!

oxbow_lakes
It's better to get NONE Microsoft than a null Microsoft.
pjp
It is - you are much less likely to end up with an NPE and your API explictly tells you that you might not get anything back
oxbow_lakes
+25  A: 

Syntactic Mistake

Thinking of "yield" as something like "return". People will try:

for(i <- 0 to 10) {
  if (i % 2 == 0)
    yield i
  else
    yield -i
}

Where the correct expression is:

for(i <- 0 to 10) 
yield {
  if (i % 2 == 0)
    i
  else
    -i
}
Daniel
+7  A: 

Misusage Mistake

Trying to pattern match a regex against a string assuming the regex is not bounded:

val r = """(\d+)""".r
val s = "--> 5 <---"
s match {
  case r(n) => println("This won't match")
  case _ => println("This will")
}

The problem here is that, when pattern matching, Scala's Regex acts as if it were bounded begin and end with "^" and "$". The way to get that working is:

val r = """(\d+)""".r
val s = "--> 5 <---"
r findFirstIn s match {
  case Some(n) => println("Matches 5 to "+n)
  case _ => println("Won't match")
}

Or just making sure the pattern will match any prefix and suffix:

val r = """.*?(\d+).*""".r
val s = "--> 5 <---"
s match {
  case r(n) => println("This will match the first group of r, "+n+", to 5")
  case _ => println("Won't match")
}
Daniel
Use `.*?` instead of `.*` - it's much more performant if, subsequent to it, long matches are expected. Each occurrence `.*` will make the matcher go to the end of the input, and then backtrack.
Dimitris Andreou
I was about to say my point wasn't about regex patterns, but about Scala pattern, but you are completely right. If I'm going to provide an example that may be used as model, I'd better do it right. So, fixed in the code.
Daniel
As a side note, this may appear on DZone, but it is now too late to change it there.
Daniel
+18  A: 

Misusage Mistake

Using the trait Application for anything but the most trivial use. The problem with

object MyScalaApp extends Application {  
  // ... body ...
}

is that body executes inside a singleton object initialization. First, the execution of singletons initialization is synchronized, so your whole program can't interact with other threads. Second, JIT won't optimize it, thus making your program slower than necessary.

By the way, no interaction with other threads means you can forget about testing GUI or Actors with Application.

More information can be found on this excellent post by Jorge Ortiz.

Daniel
+12  A: 

Misusage and Syntactic Mistake

Using scala.xml.XML.loadXXX for everything. This parser will try to access external DTDs, strip comments, and similar things. There is an alternative parser in scala.xml.parsing.ConstructingParser.fromXXX.

Also, forgetting spaces around the equal sign when handling XML. This:

val xml=<root/>

really means:

val xml.$equal$less(root).$slash$greater

This happens because operators are fairly arbitrary, and Scala uses the fact that alphanumeric character must be separated from non-alphanumeric characters by an underscore in a valid identifier to be able to accept expressions such as "x+y" without assuming this is a single identifier. Note that "x_+" is a valid identifier, though.

So, the way to write that assignment is:

val xml = <root/>
Daniel
Thankyou! I had no idea about the ConstructingParser!
Chris Hagan
+11  A: 

Design Mistake

Careless use of implicits. Implicits can be very powerful, but care must be taken not to use implicit parameters of common types or implicit conversions between common types.

For example, making an implicit such as:

implicit def string2Int(s: String): Int = s.toInt

is a very bad idea because someone might use a string in place of an Int by mistake. In cases where there's use for that, it's simply better to use a class:

case class Age(n: Int)
implicit def string2Age(s: String) = Age(s.toInt)
implicit def int2Age(n: Int) = new Age(n)
implicit def age2Int(a: Age) = a.n

This will let you freely combine Age with String or Int, but never String with Int.

Likewise, when using implicit parameters, never do something like:

case class Person(name: String)(implicit age: Int)

Not only this will make it easier to have conflicts of implicit parameters, but it might result in an implicit age being passed unnoticed to something expecting an implicit Int of something else. Again, the solution is to use specific classes.

Another problematic implicit usage is being operator-happy with them. You might think "~" is the perfect operator for string matching, but others may use it for things like matrix equivalence, parser concatenation, etc. So, if you are going to provide them, make sure it's easy to isolate the scope of their usage.

Daniel
One of my colleagues got carried away with implicit typedefs.
pjp
+8  A: 

Style Mistake

Using while. It has its usages, but, most of the time, a solution with for-comprehension is better.

Speaking of for-comprehensions, using them to generate indices is a bad idea too. Instead of:

def matchingChars(string: String, characters: String) = {
  var m = ""
  for(i <- 0 until string.length)
    if ((characters contains string(i)) && !(m contains string(i)))
      m += string(i)
  m
}

Use:

def matchingChars(string: String, characters: String) = {
  var m = ""
  for(c <- string)
    if ((characters contains c) && !(m contains c))
      m += c
  m
}

If one needs to return an index, the pattern below can be used instead of iterating over indices. It might be better applied over a projection (Scala 2.7) or view (Scala 2.8), if performance is of concern.

def indicesOf(s: String, c: Char) = for {
  (sc, index) <- s.zipWithIndex
  if c == sc
} yield index    
Daniel
Daniel - why is using a for comprehension to generate indices a bad idea?
oxbow_lakes
An index is a pointer. It's a pointer over an array, instead of directly into memory, but it is a pointer. Everything bad about pointer manipulation applies to indices. It subjects your code to pointer math errors, bound errors, etc. A for-comprehension can generate the elements themselves, and that's less prone to errors
Daniel
+24  A: 

Forgetting that Range is non-strict (in Scala 2.7)

Ranges are not evaluated until needed, which offen causes confusion when using the to function as to returns a Range.

Here's an example where we create three random numbers to use:

$ scala
Welcome to Scala version 2.7.5.final....
Type in expressions to have them evaluated.
Type :help for more information.

scala> val r = new scala.util.Random
r: scala.util.Random = scala.util.Random@36ccf41c

scala> val nums = for(i <- 1 to 3) yield r.nextInt(100)       
nums: RandomAccessSeq.Projection[Int] = RangeM(17, 79, 50)

This looks as if we have 17, 79, and 50. But when we print them again and again...

scala> println(nums)
RangeM(40, 97, 81)

scala> println(nums)
RangeM(41, 75, 86)

...we get different results because the to in 1 to 3 is a Range which is "non strict".

If you want a strict list of numbers you can add toList (or force):

scala> val nums = for(i <- 1 to 3 toList) yield r.nextInt(100) 
nums: List[Int] = List(74, 31, 90)

scala> nums
res2: List[Int] = List(74, 31, 90)

scala> nums
res4: List[Int] = List(74, 31, 90)
Richard Dallaway
Thanks, that was on my list, but I forgot to mention it. However, Range is not "lazy" but "non-strict". Specifically, as you indicated, it will recompute a map/flatMap/filter/whatever function each time it is used. A "lazy" definition would defer computing it, but compute only once.
Daniel
Thank you Daniel - I've corrected the entry above.
Richard Dallaway
On Scala 2.8, by the way, Range is now strict.
Daniel
Thank you Daniel - I've updated the title of the question
Richard Dallaway
+6  A: 

Design Mistake

Badly designing equality methods. Specifically:

  • Trying to change "==" instead of "equals" (which gives you "!=" for free).

  • Defining it as

    def equals(other: MyClass): Boolean
    

    instead of

    override def equals(other: Any): Boolean
    
  • Forgetting to override hashCode as well, to ensure that if a == b then a.hashCode == b.hashCode (the reverse proposition need not be valid).

  • Not making it commutative: if a == b then b == a. Particularly think of subclassing -- does the superclass knows how to compare against a subclass it doesn't even know exists? Look up canEquals if needed.

  • Not making it transitive: if a == b and b == c then a == c.

Daniel
The last 3 bullet points have nothing to do with Scala over Java but your point about the override def is a good one
oxbow_lakes
There's a series of questions with the same subject as this, and their focus is common errors, not common errors when changing languages. I assume this question has the same intent. Perhaps a better redaction of the question is in order? It *is* a community wiki.
Daniel
+8  A: 
Daniel
+14  A: 

Misusage Mistake

Thinking of var and val as fields.

Scala enforces the Uniform Access Principle, by making it impossible to refer to a field directly. All accesses to any field are made through getters and setters. What val and var actually do is define a field, a getter for that field, and, for var, a setter for that field.

Java programmers will often think of var and val definitions as fields, and get surprised when they discover that they share the same namespace as their methods, so they can't reuse their names. What share the same namespace is the automatically defined getter and setter, not the field. Many times they then try to find a way to get to the field, so that they can get around that limitation -- but there's no way, or the uniform access principle would be violated.

Another consequence of it is that, when subclassing, a val can override a def. The other way around is not possible, because a val adds the guarantee of immutability, and def doesn't.

There's no guideline on what to call private getters and setters when you need to override it for some reason. Scala compiler and library code often use nicknames and abbreviation for private values, as opposed to fullyCamelNamingConventions for public getters and setters. Other suggestions include renaming, singletons inside an instance, or even subclassing. Examples of these suggestions:

Renaming

class User(val name: String, initialPassword: String) {
  private lazy var encryptedPassword = encrypt(initialPassword, salt)
  private lazy var salt = scala.util.Random.nextInt

  private def encrypt(plainText: String, salt: Int): String = { ... }
  private def decrypt(encryptedText: String, salt: Int): String = { ... }

  def password = decrypt(encryptedPassword, salt)
  def password_=(newPassword: String) = encrypt(newPassword, salt)
}

Singleton

class User(initialName: String, initialPassword: String) {
   private object fields {
     var name: String = initialName;
     var password: String = initialPassword;
   }
   def name = fields.name
   def name_=(newName: String) = fields.name = newName
   def password = fields.password
   def password_=(newPassword: String) = fields.password = newPassword
 }

alternatively, with a case class, which will automatically define methods for equality, hashCode, etc, which can then be reused:

class User(name0: String, password0: String) {
  private case class Fields(var name: String, var password0: String)
  private object fields extends Fields(name0, password0)


  def name = fields.name
  def name_=(newName: String) = fields.name = newName
  def password = fields.password
  def password_=(newPassword: String) = fields.password = newPassword
}

Subclassing

case class Customer(name: String)

class ValidatingCustomer(name0: String) extends Customer(name0) {
  require(name0.length < 5)

  def name_=(newName : String) =
    if (newName.length < 5) error("too short")
    else super.name_=(newName)
}

val cust = new ValidatingCustomer("xyz123")

Thanks to Kevin Wright for the last two examples.

Daniel
there is a type in "private case class Fields" definition
efleming969
@efleming969: thanks, fixed.
Daniel
+8  A: 

Usage Mistake

On Unix/Linux/*BSD, naming your host (as returned by hostname) something and not declaring it on your hosts file. Particularly, the following command won't work:

ping `hostname`

In such cases, neither fsc nor scala will work, though scalac will. That's because fsc stays running in background an listening to connections through a TCP socket, to speed up compilation, and scala uses that to speed up script execution.

Daniel
+6  A: 

Assignment via Patterns

In scala it is possible to assign to vals (and vars) via patterns. For example:

scala> val x :: xs = List(1, 2)
x: Int = 1
xs: List[Int] = List(2)

However, this assignment is not typesafe because it has used pattern matching. For example, the following also compiles!

scala> val y :: ys : List[String] = List(1, 2)
warning: there were unchecked warnings; re-run with -unchecked for details

And then it falls over when it runs:

java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String
    at .<init>(<console>:7)
    at .<clinit>(<console>)
    at RequestResult$.<init>(<console>:4)
    at RequestResult$.<clinit>(<console>)
    at RequestResult$scala_repl_result(<console>)

This can be pretty dangerous! It's really easy to use this assignment via a pattern (useful with tuples) and then find, following a refactoring, your code breaks at runtime.

oxbow_lakes
Why isn't it considered as a bug of type checker?
andreypopp
+2  A: 

Traits defining equals, combined with case classes

If a trait overrides equals with a non-trivial implementation (i.e. not just doing super.equals(o)), it's a hideous error to mix that trait in a case class that has the compiler-generated equals() implementation.

This is because equality in case classes by default ignores any mixed traits. For example, given:

case class A()
trait T

these are equal:

A() == new A with T
new A with T == A()

So, the case class (by default) extends the equivalence relation to all instances and traits mixed in those instances, so a trait can't modify the behavior of equals, because if, whether it is weakened or made more strict, symmetry breaks (from the example above, only the second expression will change value, while the first will remain constant), thus equals breaks.

This reasoning is useful in other contexts too. If a superclass defines an equals() that is supposed to let an instance of a superclass be equal to an instance of a subclass, then by the same token, the subclass can't redefine equals - it is already defined for it (through symmetry) and it must follow suit.

Thus, consider this example:

class A(val x: Int)
class B(override val x: Int, val y: Int) extends A(x)

If we wanted to consider a new A(5) as equal to new B(5, 10), that's ok, the only problem is that we can no longer treat new B(5, 10) as NOT equal to new B(5, 125) -- these must be treated equal too! This is because:

val a = new A(5)
val b1 = new B(5, 10)
val b2 = new B(5, 125)

a == b1 //assuming this is true

then

a == b2 //also true

then

b1 == b2 //due to transitivity of equals

Which is counter-intuitive and most probably not what we wanted.

So, we see in that example that because a class defines equality with elements not of the same runtime type (but subclasses too), those other types are forced to delegate to exactly the same equals() logic. Simply put, defining equals to handle subclasses too, and take in consideration any extra state/fields of these subclasses (or traits, as above), is wrong.

The latter case is well known due to Effective Java of Josh Bloch, while the interplay of equals traits and case classes is more unique to Scala.

Dimitris Andreou
A: 

Using a scala immutable.Map (or immutable.Set) for a high-throughput collection, particularly if immutability is not required. Currently (i.e. Scala 2.7.7) , the default implementation is based on a hidden, mutable, synchronized hash-table, which maintains a history of previous "versions", which is compacted.

In the case of extremely high-throughput collections (hundreds of thousands of insertions/removals), this structure is incredibly inefficient and stressses the garbage collector enormously. Use a mutable.Map (or mutable.Set) instead!

This should be alleviated in scala 2.8 with Tiark Rompf's conversion of Phil Bagwell's data structure (currently used in Clojure)

oxbow_lakes
@oxbow_lakes: Please monitor Scala library development progress and update this as necessary, since this is likely to become untrue in the not-too-distant future.
Randall Schulz
Indeed - I should have made that more clear: I'm aware of the work being done by Tiark
oxbow_lakes
+1  A: 

Not linking your Actors, and not setting trapExit=true ! Always link your actors, otherwise you will have no idea if some propagates out of a reaction, causing your actor to exit.

I often find that it's nice to have actors in some kind of tree (i.e. there's a primary application actor at the top of the tree and every lowly worker actor is ultimately linked by other actors to the boss). So then I'd link the actors down the tree in such a way that the failure of a worker actor causes error propagation up the tree and then the boss can take some action as appropriate.

oxbow_lakes
+5  A: 

Using identifiers starting with lower case letter in pattern matching

abstract class Animal(val name: String)
class Dog(name: String) extends Animal(name)
object iMax extends Animal("iMax")

class Foo {
  def pet(animal: Animal) = animal match {
    case iMax   => println("iMax")
    case d: Dog => println("dog")
    case _      => error("Unsupported animal")
  }
}

The above code does not compile because case d: Dog and case _ are unreachable. By specification, identifiers starting with lower case letters are interpreted as variable pattern, which matches anything. You can place backtick around the variable as follows:

case `iMax` => println("iMax")

or just don't name the singletons starting with lower case letters if you're going to use it in pattern matching.

eed3si9n
Would make a freaking awesome checkstyle-scala check
retronym
+4  A: 

Sicking to imperative designs, instead of using the more functional approach that Scala encourages.

Bad:

class Item(val name: String)

val items: ListMap[String, Item] = new ListMap()

val itemOne   = new Item("One")
val itemTwo   = new Item("Two")
val itemThree = new Item("Three")

items += itemOne.name   -> itemOne
items += itemTwo.name   -> itemTwo
items += itemThree.name -> itemThree

Good:

case class Item(name: String)

val itemSeq = Seq(
  Item("One"),
  Item("Two"),
  Item("Three"))

val itemMap = itemSeq.map(i=> i.name -> i).toMap

The second approach eliminates a number of possible problems.

  • by constructing the map in a single operation, it can't be seen from another thread in a partially-built state
  • The map is immutable, you can be certain that it wont be changed elsewhere. The JVM can also use this to optimise for more performance.
  • There's less duplication, reducing the risk of human error.
Kevin Wright
+4  A: 

There is a difference between foreach and foreach(_) - consider the following code:

object Foo {
  var function: (Int)=>Unit = firstFunction

  def firstFunction(a: Int): Unit  = { 
    println("First: "+a)
    function = secondFunction 
  }

  def secondFunction(a: Int): Unit = { println("Second: "+a); }

  def main() = (1 to 2).foreach(function)
  def main2() = (1 to 2).foreach(function(_))
}

Calling Foo.main() will print

First: 1
First: 2
Calling Foo.main2() will print
First: 1
Second: 2
That's because foreach will execute it's argument, in the first example that's the current value of function, in the second, it's a function that calls the current value of function.

tstenner
There you go... I didn't think there was something so basic to learn about Scala at this point!
Daniel