Having worked on various AppSec teams there’s a particular situation I’ve experienced many many times: we get info about a new vulnerability and now we need to know how vulnerable we are.
This can be from a big disclosure, like Heartbleed or an internal bug we’ve found ourselves. In either case the next step is generally the same. We need to build a proof-of-concept (PoC) to test if the bug is present and then scan all potentially vulnerable hosts using the PoC to find out which are vulnerable. Once we have the list of vulnerable hosts we can then start the process of reaching out to owners to get things patched.
A Proof-of-Concept (PoC) != A Scalable Scan
A common problem of cscale a large numberourse is that many of the proof-of-concepts (PoCs) are just short scripts in Python or some other scripting language and are not made to be highly-scalable or distributed.
Often security engineers will resort to doing some bash-trickery to attempt to parallelize it or worse just running it slowly in a loop. Running this from an EC2 box or the engineer’s personal laptop may be hacky but it’s often how things end up getting done.
Why? Because answers are needed now in order to quickly fix the vulnerabilities. Making a PoC into a distributed scan takes a lot of work, so it often doesn’t get done in favor of more hacky methods.
Easy-Scaling with Refinery
With Refinery, it’s easy to take a basic script and scale it up to thousands of concurrent executions. With Refinery’s editor we can build a scanning pipeline to turn a basic PoC into a distributed scanner in less than the time that it takes to go on a lunch break.
This post will demonstrate how to turn a simple proof-of-concept into highly-concurrent scanner to scan tens of thousands of hosts in minutes. We’ll use the infamous Heartbleed example and take a very basic Python proof-of-concept and use Refinery to make it into a distributed serverless scanner.
Porting a Heartbleed Proof-of-Concept (PoC) into a Code Block
In Refinery, serverless services are created by connecting blocks together in our editor. For an example diagram with various block types click here. The most basic building block is a Code Block which is simply a script in one of our supported programming languages that takes some input as an argument and returns some data as output. These Code Blocks can also contain libraries (pip, npm, etc) and arbitrary binary dependencies (via Lambda layers). Under the hood Code Blocks are deployed as Lambdas on AWS.
Doing some quick searching, this heartbleed-poc repo on Github looks reasonable for our scanner. The entire proof-of-concept is around ~200 lines of Python and can be found here. This PoC takes command line arguments to specify the host and port to scan for Heartbleed and returns whether the host is vulnerable (along with some of the dumped memory if it is).
Let’s port this script to a Code Block in Refinery, start by clicking the “Add Block” button and selecting “Code Block”:
By default, the Code Block language is set to Node. Scroll down in the “Edit Block” pane and change the “Block Runtime” drop-down to “python2.7” (which this PoC is written in):
Great! Now we have a Python 2.7 Code Block that we can port our PoC code into. Copy the Python code from the PoC and paste it into the code editor (click the “Open Full Editor” button at the top of the Edit Block pane to see a full screen view). The following screenshot shows the PoC pasted into the fullscreen code editor:
We need to make some modifications to the PoC to have it work properly with Refinery. This PoC was designed for use on the CLI and not for programmatic usage (or for the Refinery Code Block format). Porting it is pretty straight forward, he’s an example of some code which ports the main function into the Refinery Code Block format:
For context, in Refinery you can chain together Code Blocks of any language together to build serverless microservices. As long as you return JSON-serializable data, you connect Node Code Blocks to Python Code Blocks to Ruby Code Blocks with no issues. The standardized interface for all languages is that each Code Block has a main() function with a Code Block input parameter and a backpack parameter.
To test our ported PoC in the Code Block, set the “Block Input Data” to the following:
We’ve now successfully ported the PoC to a Code Block and we can now scale it up!
Distributed Workers with the Queue Block
Note: To play around with the final project diagram, click here.
Now that we have ported our PoC to a Code Block let’s scale things up to scan a large number of hosts! To do this, we need to add a Queue Block to our project. Click the “Add Block” button and select “Queue Block” to do so:
Now we’ll connect the Queue Block to the Heartbleed PoC block, click the “Add Transition” button followed by clicking the “Then Transition” option:
Once you click the “Then Transition” button the Heartbleed PoC block will begin to flash. Click on it to add a transition between the two blocks:
Once you’ve done so you’ll see the following:
In Refinery, transitions are used to define the flow of execution for a service. This works in a fairly straightforward way: data returned from a block is passed to the next block pointed to by the transition. In this case, this means that items in the queue are passed to the “Check for Heartbleed” box.
The Queue Block works pretty much how you would expect. When you return an array of items from a Code Block and then transition into a Queue Block every array item will be pushed into the queue. So if you return five items in an array like the following:
Then five items will be put into the queue. This works for arrays of any size the same way, so you can return an array of ten items or ten million items and our platform will ensure everything is inserted into the queue as expected.
The Code Block that is transitioned to after the Queue Block will then be invoked with an array of items taken out of the queue. The amount of array items passed into the downstream Code Block is equivalent to whatever the Queue Block’s “Batch Size” setting is set to (default “1”). So if you use the default of “1” the Code Block transitioned to from the Queue Block will have block input of an array with one item from the queue in it.
What makes the Queue Block so powerful is that the Code Block downstream of it will automatically be scaled up to process all of the items in the queue. This means that if you put 1 million things in the queue the downstream Code Block will rapidly scale up the number of concurrent executions in order to meet the demand. For safety there is a default ceiling of 1,000 concurrent executions for Refinery users (this limit can be increased by contacting support). This means that you’ll have 1,000 instances of the Heartbleed PoC scanning at the same time. Once a Code Block finishes executing it will immediately pull another item off the queue (if there are more available) until the queue is empty. All of this translates into the ability to make a distributed auto-scaling worker queue in a few clicks (if you’re familiar with Celery, or RabbitMQ, you can think of these as similar examples).
Back to our project, we now need to actually load some things into the queue! Add another Code Block to the project by again clicking “Add Block” and selecting “Code Block”.
For this Code Block, we need to get a list of IPs, format them into an array of Code Block inputs for our Heartbleed Block, and return them. The following Python code demonstrates an example of this:
As can be seen from the above return data, each item in the array corresponds to the input format for the Heartbleed PoC Code Block we created earlier. This is because each item will be passed as input to that block which is transitioned to from the Queue Block.
Now we connect our new Code Block to the Queue Block by clicking “Add Transition” button, clicking the “Then Transition” option and clicking on the Queue Block. Once you’ve done so you’ll have a project diagram similar to the following:
We now need to make one final small modification to our Heartbleed PoC Code Block, modify the main() function to be the following:
We’ve added a line to set the block input to the first item in the input array. This is because the input to the Code Block from the Queue Block will be an array.
Once you’ve done this, we’re ready to deploy our project. Click the “Deploy Project” button followed by clicking the “Confirm Deploy” button to deploy your service:
Once the deployment is complete, click on the first Code Block in the diagram and click the “Code Runner” button:
This opens a panel to kick off the deployed pipeline. Click the “Execute with Data” button to start the pipeline:
The pipeline has now been started. You’ll be able to watch the executions occur live and step through all of the executions that occurred:
In our case we used completely random IP addresses so all of our Heartbleed PoCs ended up encountering uncaught exceptions upon executing. If we click on the block we can see the reason why:
By viewing the Block Execution Logs panel we can see that the PoC resulted in an uncaught stack trace. The debugger in Refinery allows you to step through each execution and see the full input, terminal output, and return data for each block. You can use the drop-down menu to page through the list of executions for the selected block. For more information, see the Debugging & Logging page in the Refinery docs.
Given that we chose random IPs, the timeout exceptions makes sense (choosing ten random IPs from the Internet, it’s not likely that any will be running HTTPS on 443). Since these hosts are not considered vulnerable we don't actually have to fix these exceptions.
However, scrolling through all of the executions logs to find which are not vulnerable is pretty painful. Let's update our project so that we filter for only hosts that are actually confirmed vulnerable!
Using the If Transition
Refinery has a number of different transition types which can be used between blocks in the editor. One particularly useful transition type is the "If Transition" which allows for specifying that a transition only be taken if the return data from a block matches a specific format.
We can use this to trigger a Code Block if the return data from our Heartbleed PoC indicates a host is indeed vulnerable. To set this up, add another Code Block to the diagram via the "Add Block" button, followed by clicking on the "Code Block" option. Once you've done this, click on the Heartbleed PoC Code Block and click "Add Transition" followed by selecting the "If Transition" option:
Once you've done this you'll be presented with a small code editor with an example Python conditional statement. For "If Transitions" you can specify Python conditional statements that will check the return data of a block and the transition will occur if the conditional evaluates to "true". Set your conditional statement to the following:
This means that the conditional will be followed if the "vulnerable" key of the return data is true (e.g. the host is vulnerable to Heartbleed). Once you've updated the conditional, click on the Code Block you just added to add the transition:
You now have a Code Block which will be executed only with positive matches for hosts with the Heartbleed vulnerability. You can customize this Code Block to handle the alert however you'd like. Even better, since Refinery has a Community Block Repository you can use some of the already-created blocks to handle this. For example, here's some of the Saved Blocks already in the Refinery Community Block Repository:
Insert Rows into Google Sheet (Insert the vulnerable hosts into a Google Sheet)
Send email via SMTP (Email an alert using your own mail server)
Send email via Mailgun (Send an email about the vulnerability via Mailgun)
Send SMS via Twilio (Send a text message with info about the vulnerable host)
Postgres SQL Query (Insert the vulnerability into a database)
These are just a few examples of already-existing blocks, depending on your situation you can customize it to your specific needs!
Extending the Pipeline
We've covered how to build a distributed serverless Heartbleed scanner in Refinery. This pipeline is easy to extend and add additional scans to, check out some of these example forks to see how you can extend this example yourself:
One fairly common thing people want to do with Lambdas is chain them together to build microservices and workflows. This sounds easy enough in theory, but in practice tends to be much more complex (as is the case with most things in AWS). This post will walk through a few different methods to chain Lambdas together. We'll cover how you can chain together Lambdas using only vanilla Lambda functions, using AWS Step Functions, and using our platform, Refinery.
Since a good number of our customers use our serverless platform to more easily deploy and scale their web bots and scrapers, I thought I’d write a post about a fun scraping challenge I encountered. Solving it required thinking a little bit outside the box I thought I’d share it here since it demonstrates a fairly re-usable approach to scraping heavily-obfuscated sites. This post will dive into how you can use request interception in Puppeteer to beat heavily obfuscated sites that are built to be resistant to scraping.