Saturday, July 9, 2016

I was Wrong about Cowan - 78 seats now possible

In my last post, I'd all but assumed Cowan would go to Labor, but in a completely unexpected twist the first absent votes have swung more to the Libs than even the postal votes. My model's now predicting it as more-likely-than-not a Coalition victory, and this is a quite conservative model as it assumes different non-ordinary vote types (e.g. absent and postal votes) will swing together. If this early trend in absent votes isn't just a fluke, it could be a very comfortable LNP victory despite being marked as a probable Labor win by analysts.

That would mean if Herbert and Flynn fall to the LNP as expected (by me, but I could easily be wrong again!), their promising Capricornia results continue on, Forde goes their way as also expected by analysts, and absent votes get them over the line in Cowan - 78 seats is now looking, if not likely, then at least far more feasible than it did last night.

Friday, July 8, 2016

Battle of the Forde

I'd actually managed to stay away from election stuff today but it drew me back with a couple of articles from the ABC, with Antony Green and Barrie Cassidy weighing in on the situation. My prediction of 76 or 77 seats is looking pretty likely, but the thing that interested me is Green and Cassidy both suggesting that the Coalition is "on track" and "too far ahead" in Forde respectively.

Now, Herbert and Flynn look to be pretty much wrapped up for the Coalition, and Capricornia is still close but may well head in that direction after a better result their way in recent counting, giving them a likely 76 seats. That is by no means certain though, as the graph below may help demonstrate (as in previous posts, the Libs win if the red line is above the black horizontal line at the right side of the graph; current progress based on counting is red crosses; simulated possible futures are red wibbly lines; results if non-ordinary votes behave like in 2013 relative to the ordinary votes are in black; and the track needed for the Coalition to win in blue)

Forde, on the other hand, is looking very interesting. The Coalition is probably more likely than not to win but after a relatively strong showing by the ALP in recently counted absent votes (the downwards red kink) there's not a lot in it, and even with a few data points my simulations are still showing a lot of variation that could easily bounce either way in the remaining votes. So I'm surprised at the analysts' confidence, especially given they've been relatively cagey on other results. Maybe there's something I'm missing?

And then there's Hindmarsh, which is expected to go to Labor and is looking good for them, but it's still early days...

All of this together means that while the 77 seat scenario is looking likely, any result between 75 and 78 is still very possible depending on these three seats. And then the only close seat not mentioned is Cowan which is an expected Labor win but anything can still happen...

Thursday, July 7, 2016

Postal votes coming in

As the postal votes trickle in for the remaining seats, the situation seems to be getting firmer (see previous analysis here) on top of the 72 locked-in seats for the Coalition - Capricornia remains on a knife edge but the remaining seats are looking more and more like Coalition wins except for Cowan and Hindmarsh (likely to go to the ALP).

I've cobbled together simulations for each seat of 100 potential non-ordinary vote futures (thin red lines below), all of which go to the LNP in Flynn, Gilmore (already called for the Coalition on ABC) and Herbert. In Forde it's less clear but still 90 simulations are predicted the LNP's way; in Capricornia, 54 pick the LNP making it a real toss-up, and Hindmarsh has only 31 to the LNP and Cowan 6. I may explain how I put the simulations together in a later post.

According to this so far (and things can easily change - my assumptions are cobbled-together and by no means rock solid), the Coalition is looking pretty well set for 75 seats, likely for 76 and still very possible for 77 seats.

Wednesday, July 6, 2016

The 77 seats scenario

Now the postal votes have started coming in, I've been casting around for a way to visualise what's going on. One way I've found useful is, as in my previous post, by using the predicted swings in non-ordinary votes (postals etc) compared to ordinary votes, using the 2013 results as a guide - these swings tend to go to the Coalition. In the plots below, I've put those predictions in black, the relative overall swing needed on top of the postals swing for the Coalition to hit 50-50 in blue, and the data so far in red. I've assumed the postals will come in first (they're all we have so far) and the other vote types afterwards (absent, pre-poll and provisional in that order - this probably isn't accurate!).

In the seat of Forde, my earlier model currently predicts an LNP win by about 400 votes despite them only being ahead by 94 votes after ordinary votes. Though commentators were surprised by a "Coalition surge" as postal votes started coming in, it's actually quite consistent with 2013 results (see the red cross).

I've checked a bunch of other seats that my model predicted would be close (Batman, Chisholm, Cowan, Flynn, Gilmore, Herbert and Hindmarsh) and most of the other results have been relatively consistent with 2013 patterns as well (if not better for the Coalition), currently confirming my 76 seats prediction - except for one. Herbert seems to be looking particularly good for the Coalition, well above where they need to be to make up the difference (see graph below - they're well above the predicted trend, and even well above the needed blue line).

I can't stress enough that it's early days, other seats with postals yet to count could well swing the other way to the ALP, and the number of remaining votes could be well out, but if current trends continue, they could pick up an extra seat to make 77. Interesting times ahead...

Update #1: Things are looking worse than expected for the Coalition in Capricornia - it looks too close to even guess one way or the other. As it stands currently, on top of the ABC's current 72 to Coalition, 66 to ALP:

Probable LNP: Flynn, Forde, Gilmore, Herbert
Probable ALP: Cowan, Hindmarsh
Complete guess: Capricornia

For interest, this is why I think Flynn is a probable LNP despite it having been in the ABC election ALP column until recently:

Tuesday, July 5, 2016

Election results madness

With help from the AEC scraping code by Mick McCarthy here, I've tried to predict the election results once the "non-ordinary" votes (early, pre-poll, postal and provisional) have been counted. These tend to go more for the Liberal party, so seats that Labor is currently ahead in might be lost. 

I’ve assumed that:
  • all ordinary votes are in (only non-ordinary votes left), 
  • the proportion of formal votes remains the same as in 2013 for all electorates, 
  • the number of non-ordinary votes all increase proportionally with each other, 
  • and the changes in 2 candidate preferred votes between voting types (I call this "bias") in each electorate stay the same between 2013 and 2016 (e.g. if the Libs get a 2% bounce in postals in Denison in 2013 compared to their ordinary vote result, that same proportion holds in 2016).
It turns out that William Bowe of the Poll Bludger, my favourite election analysis page, pipped me to the post with his analysis here, but I thought I'd run mine anyway. And it gave similar results, but importantly different enough to get the Coalition over the line. Here are the number of votes that the LNP (Liberal/Nationals party, well, Coalition) are expected to be ahead or behind in the closest seats:

Capricornia LNP +547
Chisolm LNP +1230
Cowan LNP -580
Flynn LNP + 1469
Forde LNP +86
Gilmore LNP +713
Herbert LNP -315
Hindmarsh LNP -786
Melbourne Ports LNP -1224 (I didn't do 3CP analysis though...)
Petrie LNP +1447

The Poll Bludger's analysis has them behind in Forde by 18 votes - my extra seat gives them 76 seats, a majority in their own right, whereas his 75 seats is not enough.

I then had a play with adding random variation, relaxing the assumptions of the proportion of formal votes staying the same and the amount of bias staying the same. I assumed that the overall variance between electorates for these properties stayed the same, but that a little (10%), half (50%) or all (100%) of the variance was due to random variation and not the specific effect of being in one electorate or another. The less important electorates became, the less likely a Coalition victory - for a little, 66% of simulations resulted in a win; for half, 34% and for full random variation, only 20% of cases.

Edit: Using 2010 results instead give us similar results, though slightly worse for the Coalition - 76 seats for the Coalition without variation, and 53%, 20% and 16% in the three scenarios described respectively.

Edit #2: Using 2007 results is more difficult because of seat redistributions, but we can do it if we assume that Liberals and Nationals will experience the same swings in Capricornia and Flynn. The model also gives 76 seats for the Coalition without variation, and 91%, 49% and 27% for the scenarios.

The model also pointed to some seats that could potentially be very close (winning 2PP < 50.2%) in these situations, so watch for these potentially coming into play if things get even more interesting:

Batman (VIC)       
Longman (QLD)

Banks (NSW)
Dickson (QLD)
Dunkley (VIC)
Griffith (QLD)
La Trobe (VIC)
Lindsay (NSW)
Robertson (NSW)

Wednesday, August 26, 2015

St Petersburg Paradox

I was having lunch with teacher friend  the other day, and discussing some interesting examples of how statistics and probability can get kind of weird. He loved the Birthday Problem and decided to use it for his class, but was particularly fascinated by the more tricky St Petersburg Paradox.

The problem goes thus: there is a game that costs X dollars to play, which simply involves tossing a coin. You start with a pot $2, and every time the coin comes up heads the banker doubles the pot. As soon as the coin comes up tails the game ends, and you get to walk away with the pot. The question is, how much is a reasonable amount of money X to play the game?

Where the paradox comes in is how statistics defines 'fair'. Usually we calculate the average, or "expected" amount of money to be made from the game, by totalling up all of the possibilities combined with how much we expect to make from them. In this game, we have a 50:50 chance of getting $2 (the first throw being a tail), and then a 1/4 chance of getting $4 (a head, then a tail), then 1/8 chance of getting $8 (heads, heads, tails) and so on. That means we can expect on average $1 from the worst-case scenario (it's $2, and happens half the time, and $2 x 1/2 = 1), and another $1 from the heads-tails scenario ($4 x 1/4 = $1) and so on. This process goes on forever - it's always possible to get more heads - so the average amount we expect to win in this game is $1 + $1 + $1 + .... = infinite money, and that's how much we should apparently spend to play the game.

This obviously doesn't make sense. For a start, you're always going to lose at some point, so it's physically impossible for you to make infinite money no matter how many times you get heads. The problem is that the idea of an expected amount of money depends on the assumption that we want to know what happens in the long run, so it assumes we are playing this game infinitely many times and taking the average. But when we play infinitely many times, we suddenly have access to the end of the rainbow where we're making infinite money - the idea is that infinity is a mathematical construct that we never see in reality. Usually we can deal with it pretty happily without weird things happening, but this is a weird game, and breaks our usual assumptions.

What we can do instead is see what's most likely to happen to our winnings as we keep playing. For a single game, it's pretty clear that most of the time we'll win either $2 or $4 (with a 50% and 25% chance respectively), and occasionally $8 (12.5%) but we're not likely to win much more than that. If we play two games, then our worst case scenario is that we'll win $4, with a 1/2 x 1/2 = 1/4 chance. There are two ways we can win $6 - we can win $2 then $4, or $4 then $2. Both of these options have a 1/2 x 1/4 = 1/8 chance of happening, so overall we've got 1/4 chance of that happening too. We can calculate the other possibilities that way too - obviously we have to stop at some point, but we can go far enough to get a decent idea. We can then keep going and see what happens when we play more and more games in a row, and getting bigger jackpots gets more and more likely.

Of course, the best way to do this is with a computer to avoid all those pesky calculations. Here is a graph of the possibilities over the course of 100 games:

Lighter colours represents where a possibility is relatively likely, and dark colours where it is unlikely. You can see little waves towards the top-left of the graph - this is where after a few games there's a small but decent chance of getting a single big win which overwhelms all of the other winnings. Especially when not many games have been played, it's more likely that you'll get a single big win and a lot of small wins than multiple medium-sized wins.

The blue line represents the median average win, and is surrounded by red interquartile lines - the idea is that half of the time, your winnings per game after a certain number of games will be between the two red lines. For example, after 50 games, it's 50-50 whether your average winnings are above or below $8.20 (the median), and half the time your average winnings will be between $6.12 and $12.44. So if you paid only $6 a game, you're probably doing pretty well at this point!

The most important part of this graph is that these numbers are going up as we keep playing games, meaning that the game becomes more and more reliably profitable. Further along the graph, the computer can no longer keep track of the higher numbers of winnings (which is why the red line disappears) so we need to find another way to work out what happens with more than 100 games. Using results cited in this paper, we can actually estimate the median winnings as

$2.55 + log2(number of games)

So after 100 games, $9.20 looks like a reasonable price - paying that price, half the time we'll end up ahead, the other half we won't. Note that the distribution is what statisticians call skewed - even though we only come out ahead half the time after 50 games, the "good" half is a lot better than the "bad" half is bad.

Let's say that we really want to milk this game for all it's worth, and we've found a game online that we can make our computer play for us. If we can play a million games a second, and leave our computer running for a year, that's over 30 trillion games. If we put that into our formula, we get a median win of $47.40 per game. If we paid that much per game to play, we'd expect to lose a lot of money at the start but make it back as the games wore on and we got more and more jackpots, breaking even after a year. However, if we only paid $9.20 as before, we'd expect to be doing ok by 100 games (i.e. after 100 microseconds), and by the time our program had been running for a year, we'd be looking at profits around $1200 trillion dollars - 700 times Australia's GDP and enough to basically rule the world.

Unfortunately, no casino will ever host this game, online or otherwise, for exactly this reason. Sooner or later, the house will always lose.

Thursday, April 23, 2015

Quadruple rainbow!

A couple of days ago, someone at a train station in New York tweeted this photo of a quadruple rainbow:

Like most people, I'd never even heard of such a thing! Some reasonably reputable sites assured me that such a thing exists, and is caused by the combination of two things:

The first is that you can get two different paths of reflection of rays of light happening within water droplets, which gives us another "secondary" rainbow.

The second is the effect of a body of water, usually behind the observer along with the sun, reflecting the sun - it acts just like another (though less bright) sun shining from a different location, and gives us another pair of rainbows. Because the second pair of rainbows is from another "virtual sun", the centre of the rainbow is in a different place so they're offset a bit from the first pair, hence the weird shapes.

So, that makes sense. But there's water everywhere! Rainbows aren't that uncommon, and even double rainbows are seen occasionally, so why are quadruple rainbows so rare? I've never seen one, and I've seen plenty of double rainbows!

First, the reason that we don't see rainbows all the time is that we need the sun to be shining behind you, and it to be raining in front of you so we have water droplets for the sunlight to reflect off. Often weather is one or the other - either all rain and clouds (hence no sun) or no rain. Also, if the sun is too high in the sky, a rainbow can't happen - a raindrop has to bend the light a certain amount (40-42° for a normal rainbow, 50-53° for a secondary one), so you can't have the sun and the reflections you need for a rainbow in the sky reaching your eye at the same time if the sun's elevation is more than 40° above the horizon. For a secondary rainbow it only needs to be less than 53° above the horizon, but the reflections are a lot weaker (the rays of light have to pass through the raindrop twice and bounce off the inside once) so it's a lot harder to see unless the conditions are just right.

Here's a drawing from this site showing the path of light from the sun (yellow lines) at sunrise or sunset, and how they bounce off raindrops to create rainbows at the blue and red colours (these are reflected at different angles, hence the colours of a rainbow). The higher the sun in the sky, the more downward-pointing those yellow lines will be, and the closer to the horizon and harder to see the rainbow will be (to see the effect, tilt your head to the left and imagine the ground is still horizontal from your perspective).

To get the other two rainbows, we also need to have a body of water the right distance away behind you (it's also possible in front of you, but more difficult) to create another "sun" that will also make two rainbows - this will already be more difficult because the reflected sun will be less bright depending on how good a mirror the water body is.

We can do a bit of geometry and work out what the required distances would be. Given the sun is reflecting off a raindrop a certain distance in front of us, this plot gives us the relative distance we'd need the height of the raindrop (above the ground) and water (behind us) for the primary and secondary (dotted in the plot) reflected rainbows to occur:

We can work out a few things from this. First, the apparent height of the original and secondary rainbows (green) are never much larger than the distance the rain is away - the higher the sun is in the sky, the lower the height relative to distance. As the rain goes right to the ground, this doesn't really restrict us at all until the rainbow goes below the ground and we can't see it.

However, for the reflected rainbows (black), it's the opposite - the higher the sun is in the sky, the higher we expect the raindrops to be, and they're almost always going to be at least as high as the rain is far away. We'd expect raindrops to usually be less than 6km high based on this site, so the rain should be closer than that at least.

Using the timestamp in the Twitter post (changed to New York time, 5:57am) and this site, we can actually work out where the sun was in relation to New York when the picture was posted (and, hopefully, taken). It turns out it was not long after sunrise, so the sun was quite low in the sky at about 8.2° in height. The blue line on the graph represents this. The highest points of our secondary original rainbow (green, dotted) and our primary reflected rainbow (black, plain) should be at around the same place, with the original slightly lower, and this looks to be the case on the image. So far so good! Also the primary original (green plain) and secondary reflected (black dotted) are below and above these two.

The next thing we need to check is if the water lines up - the water for the primary reflected rainbow should be about 7 times further than the rain, and the secondary reflected about 12 times further. The direction of the sun at that time of morning was about 81°, so just north of east. If we assume the rain was about 1km away, looking at the map of the location there are two likely-looking patches of shallow, calm water about 7km and 12km in that direction at Oyster Bay and Cold Spring Harbor respectively.

So after doing some detective work, it looks like not only is the quadruple rainbow plausible, but the combination of a series of unlikely but very possible events!