views:

118

answers:

2

I'm just digging a bit into Haskell and I started by trying to compute the Phi-Coefficient of two words in a text. However, I ran into some very strange behaviour that I cannot explain.

After stripping everything down, I ended up with this code to reproduce the problem:

let sumTup = (sumTuples∘concat) frequencyLists
let sumFixTup = (138, 136, 17, 204)
putStrLn (show ((138, 136, 17, 204) == sumTup))
putStrLn (show (phi sumTup))
putStrLn (show (phi sumFixTup))

This outputs:

True
NaN
0.4574206676616167

So although the sumTupand sumFixTup show as equal, they behave differently when passed to phi.

The definition of phi is:

phi (a, b, c, d) = 
    let dividend = fromIntegral(a * d - b * c)
        divisor = sqrt(fromIntegral((a + b) * (c + d) * (a + c) * (b + d)))
    in dividend / divisor
+2  A: 

Can you provide a type signature for sumTuples and sumTup?

In GHCi:

:t sumTuples
:t sumTup
Dan
+8  A: 

This might be a case of integer overflow. The value being passed into fromIntegral in your divisor is 3191195800, which is larger than a 32-bit signed Int can hold.

In ghci (or whatever you're using), use

:t sumTup
:t sumFixTup

to see the types of those variables. I'm guessing you'll find that sumTup is (Int, Int, Int, Int) (overflows) and sumFixTup is (Integer, Integer, Integer, Integer) (doesn't overflow).

Edit: on second thought, a tuple of Ints can't be equal to a tuple of Integers. Even so, I think that ghci will fix the type of sumFixTup to be a tuple of Integers, while sumTup probably has a type of the form (Num a) => (a, a, a, a) or (Integral a) => (a, a, a, a), which depends on the function defining it.

Ghci will then convert them to Integers to compare with sumFixTup, but may convert them to Ints when calculating the divisor in phi, causing overflow.


Another edit: KennyTM, you're half right:

Prelude> :t (1,2,3,4)
(1,2,3,4) :: (Num t, Num t1, Num t2, Num t3) => (t, t1, t2, t3)
Prelude> let tup = (1,2,3,4)
Prelude> :t tup
tup :: (Integer, Integer, Integer, Integer)

So for the examples given in the question:

putStrLn (show ((138, 136, 17, 204) == sumTup))

The literal (138, 136, 17, 204) is inferred to be a tuple of Int to match sumTup, and they compare equal.

putStrLn (show (phi sumTup))

sumTup consists of Ints, causing overflow as suggested above.

putStrLn (show (phi sumFixTup))

sumFixTup consists of Integers, giving a correct result. Note that sumTup and sumFixTup were never compared directly, so my earlier edit was based on a misreading.

Nefrubyr
Heh, we had the same intuition; I'm curious if this is true.
Dan
in fact, according to http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.2.0.0/Data-Int.html, `Int` is only guaranteed to be at least a 30-bit signed int
newacct
Actually `sumFixTup` has a type of `(Num t, Num t1, Num t2, Num t3) => (t, t1, t2, t3)`.
KennyTM
I haven't gotten to the types yet (they're local to the main method and Leksah doesn't let me break there for some reason), but defining the type of `sumFixTup` to be (Int, Int, Int, Int) makes it overflow as well. So I think you're right. I'm wondering why those tuples are comparing equal, though, tuples of different types are supposed to be different types, aren't they?
Johannes Stiehler
So you're definitely right, I fixed the problem by defining the signature of `phi` to require Integer values as input and removing all other explicit type signatures to let the Haskell type system do its magic. The problem stemmed from the fact that trying to follow "best practice for beginners" (TM) I defined the signatures, but defined them too narrow thus forcing `Int` to be used. I have to remember to be more polymorphic in the future. Thanks for the answers.
Johannes Stiehler