In my spare time I’ll often play “Pokemon Go” a popular augmented reality mobile game where players can catch, train, and battle Pokemon. For those unfamiliar with Pokemon Go, you can obtain in-game items by physically visiting “Pokestop” locations in the real world and “spinning” them in the app.
The entire game often hinges on having enough of these items (Pokeballs, eggs, berries, etc). So spinning as many Pokestops as possible is a shared-goal for basically all players.
Pokestops are designed to encourage players to constantly be on the move instead of standing around. For example, once you spin a Pokestop you can’t spin it again for five minutes, so you always want to be heading to the next-nearest stop. Even if you wait around to re-spin the same stop, you are only awarded one fifth of the experience points for doing so (50 XP instead of 250 XP for a new stop).
Naturally, while playing this game, I often wondered: what’s the most efficient path for this? How can I minimize the amount of time spent walking and maximize the number of Pokestops I’ve spun? Of course, this is a problem XKCD can relate to:
This post will dive into my attempt to build a service to find the optimal route for spinning Pokestops. Along the way, I learned quite a bit about constraint solving, vehicle routing algorithms, scraping, and of course: Pokemon Go.
Getting the Data
Naturally the first step to create a service to find an optimal Pokestop route is to get the GPS coordinates for all of the nearby Pokestops. Luckily, the Pokemon Go community has sites which are user-contributed and attempt to map out all of the Pokestops in existence:
By scraping these sites you can get a list of GPS coordinates of nearby stops. The process for this is actually more complicated than it sounds due to a lot of obfuscation, but it is covered in detail in another blog post. Suffice to say, by scraping these online sources we can get the coordinates we need.
More Coordinates More Problems!
Before we get into the fun that is choosing an appropriate routing algorithm we need to first enrich our dataset a bit. A bunch of GPS coordinates is not enough to help us figure out the best route. We need to also calculate the travel distance between all of our coordinates. For example, if we have four points, A-B-C-D we’d need the following distances:
An array of arrays of distances between all of the points is known as a distance matrix. The following is an example distance matrix:
As can be seen in the above table, you can get the travel distance by choosing two points and seeing where they intersect on the table. You'll also notice that the intersection of points which are the same letter is zero because the travel time from a point to itself is of course zero.
Once we calculate this distance matrix, we can then use an appropriate routing algorithm to get the results we’re after!
However, this isn’t as easy as it sounds. After all, most Pokemon Go players can’t fly (although, drones are a clearly under-explored path to Pokemon Go success). You’ll have to walk on streets and sidewalks to get to all of the stops. Since these points exist in the real world, you’ll have to calculate the walking distance taking into account real world roads, sidewalks, and terrain. This requires querying sources which have accurate mapping information.
I investigated several potential sources for this data, such as the following:
After reviewing each API, the OpenRouteServer option seemed to be the most appealing. Mainly because the API is 100% free and has pretty generous quotas. Their Matrix service endpoint allows for calculating the distances between up to 50 points in a single request, and can calculate pedestrian walking distance. Using this API endpoint we can plug in our previously-scraped GPS coordinates and get a distance matrix of distances to plug into whatever routing algorithm we choose.
Choosing the Best Routing Algorithm
Now on to the fun of picking an appropriate routing algorithm. While it’s easy to create a route that hits all of these stops, creating a route that you’d actually want to use is a different story. For example, consider the following:
If you have spots A, B, C, & D and spot D is 1 mile away while spots A, B, C are only 0.2 miles away - you might just want to skip going to spot D altogether. Put simply: you want to be able to drop stops if they are too arduous to complete.
If you have 100 Pokestops within five miles of you, chances are you don’t actually want to walk to all 100. You likely only have time to walk to 10 or 15 stops before you return home. Basically, you want an upper bound on the number of stops to walk to.
These are constraints (ha!) which we will have to consider when picking an appropriate algorithm to use.
If you’re ever looking for a good resource on various routing and constraint solving algorithms, check out Google OR-Tools. It has a library of many different constraint solving algorithms along with documentation and example implementations in various languages. Reading through their documentation and examples helped me understand the requirements and the problem space for route-solving.
Examining some of the routing algorithms listed by Google OR-Tools, the article on penalties and dropping visits seems to be the closest to our needs. This article sets up a problem of having delivery trucks which cannot meet the demand of all of the delivery locations where drop-offs need to be made. Since 100% completion is not possible, it allows for dropping off points and doing the best with whatever capacity restraints are set.
In the real world, an implementation of this algorithm would almost certainly be related to picking up supplies and packages from a warehouse and delivering them to customers. In our situation, however, we don’t have any packages to deliver (or any customers to please). The only restriction we really have is the number of Pokestops we have time to visit. So we hack this algorithm to fit our problem by simply specifying our “vehicle capacity” to be the maximum number of Pokestops we wish to visit, and we set the demand of each stop to be “1”. With these settings, whatever vehicle capacity we pick will be the number of Pokestops in our calculated route. So if we only want to visit 10 Pokestops, we simply set our vehicle capacity to 10 and the constraint solver will analyze all points and pick the optimal ones:
The result of running this algorithm is an ordered array of GPS coordinates representing our optimal route. We’ve almost achieved our goal! But a bunch of GPS coordinates isn’t exactly super useful when you’re on the go with your phone attempting to spin Pokestops...
Formatting and Displaying the Data for on-the-Go
Now that we have the coordinates for our optimal route we just need to display it in an appropriate format for mobile use. The OpenRouteService API we used previously calculates it’s routes via OpenStreetMap (a community-driven mapping project). Given this, if we want our route to not be mangled we’ll have to use a web mapping service that uses OpenStreetMap as its map.
Luckily, OpenRouteService also has a mobile-friendly online map which you can use to display routes. By providing a list of comma-separated GPS coordinates to the “a” parameter, you can get a drawn map of your Pokestop route. The following link is an example calculate optimal route using the Google campus as a testbed:
In order to make it extra mobile friendly, I used the existing Refinery Saved Block for sending Twilio SMS messages in order to make the service into an SMS bot. The bot takes a starting-address sent to it via SMS and returns the Pokestop route URL after finishing it’s calculations:
Since a good number of our customers use our serverless platform to more easily deploy and scale their web bots and scrapers, I thought I’d write a post about a fun scraping challenge I encountered. Solving it required thinking a little bit outside the box I thought I’d share it here since it demonstrates a fairly re-usable approach to scraping heavily-obfuscated sites. This post will dive into how you can use request interception in Puppeteer to beat heavily obfuscated sites that are built to be resistant to scraping.
In this post we show how to scale up a simple security proof-of-concept test into a highly-concurrent serverless distributed scanner. This allows us to rapidly scan thousands of hosts in seconds and quickly identify vulnerable hosts.