cipherdyne.org

Michael Rash, Security Researcher



Software Release: fwknop-2.6.3

fwknop-2.6.3 software release The 2.6.3 release of fwknop is available for download. The emphasis in this release is maximizing code coverage through a new python SPA packet fuzzer, and also on fault injection testing with the excellent fault injection library libfiu developed by Alberto Bertogli. Another important change in 2.6.3 is all IP resolution lookups in '-R' mode now happen over SSL to make it harder for an adversary to mount a MITM attack on the resolution lookup. As always, manually specifying the IP to allow through the remote firewall is safer than relying on any network communication - even when SSL would be involved.

Here is the complete ChangeLog for fwknop-2.6.3:

  • [client] External IP resolution via '-R' (or '--resolve-ip-http') is now done via SSL by default. The IP resolution URL is now 'https://www.cipherdyne.org/cgi-gin/myip', and a warning is generated in '-R' mode whenever a non-HTTPS URL is specified (it is safer just to use the default). The fwknop client leverages 'wget' for this operation since that is cleaner than having fwknop link against an SSL library.
  • Integrated the 'libfiu' fault injection library available from http://blitiri.com.ar/p/libfiu/ This feature is disabled by default, and requires the --enable-libfiu-support argument to the 'configure' script in order to enable it. With fwknop compiled against libfiu, fault injections are done at various locations within the fwknop sources and the test suite verifies that the faults are properly handled at run time via test/fko-wrapper/fko_fault_injection.c. In addition, the libfiu tool 'fiu-run' is used against the fwknop binaries to ensure they handle faults that libfiu introduces into libc functions. For example, fiu-run can force malloc() to fail even without huge memory pressure on the local system, and the test suite ensures the fwknop binaries properly handle this.
  • [test suite] Integrated a new python fuzzer for fwknop SPA packets (see test/spa_fuzzing.py). This greatly extends the ability of the test suite to validate libfko operations since SPA fuzzing packets are sent through libfko routines directly (independently of encryption and authentication) with a special 'configure' option --enable-fuzzing-interfaces. The python fuzzer generates over 300K SPA packets, and when used by the test suite consumes about 400MB of disk. For reference, to use both the libfiu fault injection feature mentioned above and the python fuzzer, use the --enable-complete option to the test suite.
  • [test suite] With the libfiu fault injection support and the new python fuzzer, automated testing of fwknop achieves 99.7% function coverage and 90.2% line coverage as determined by 'gcov'. The full report may be viewed here: http://www.cipherdyne.org/fwknop/lcov-results/
  • [server] Add a new GPG_FINGERPRINT_ID variable to the access.conf file so that full GnuPG fingerprints can be required for incoming SPA packets in addition to the abbreviated GnuPG signatures listed in GPG_REMOTE_ID. From the test suite, an example fingerprint is:
    GPG_FINGERPRINT_ID     00CC95F05BC146B6AC4038C9E36F443C6A3FAD56
    
  • [server] When validating access.conf stanzas make sure that one of GPG_REMOTE_ID or GPG_FINGERPRINT_ID is specified whenever GnuPG signatures are to be verified for incoming SPA packets. Signature verification is the default, and can only be disabled with GPG_DISABLE_SIG but this is NOT recommended.
  • [server] Bug fix for PF firewalls without ALTQ support on FreeBSD. With this fix it doesn't matter whether ALTQ support is available or not. Thanks to Barry Allard for discovering and reporting this issue. Closes issue #121 on github.
  • [server] Bug fix discovered with the libfiu fault injection tag "fko_get_username_init" combined with valgrind analysis. This bug is only triggered after a valid authenticated and decrypted SPA packet is sniffed by fwknopd:
    ==11181== Conditional jump or move depends on uninitialised value(s)
    ==11181==    at 0x113B6D: incoming_spa (incoming_spa.c:707)
    ==11181==    by 0x11559F: process_packet (process_packet.c:211)
    ==11181==    by 0x5270857: ??? (in /usr/lib/x86_64-linux-gnu/libpcap.so.1.4.0)
    ==11181==    by 0x114BCC: pcap_capture (pcap_capture.c:270)
    ==11181==    by 0x10F32C: main (fwknopd.c:195)
    ==11181==  Uninitialised value was created by a stack allocation
    ==11181==    at 0x113476: incoming_spa (incoming_spa.c:294)
    
  • [server] Bug fix to handle SPA packets over HTTP by making sure to honor the ENABLE_SPA_OVER_HTTP fwknopd.conf variable and to properly account for SPA packet lengths when delivered via HTTP.
  • [server] Add --test mode to instruct fwknopd to acquire and process SPA packets, but not manipulate firewall rules or execute commands that are provided by SPA clients. This option is mostly useful for the fuzzing tests in the test suite to ensure broad code coverage under adverse conditions.

Software Release: fwknop-2.6.0

fwknop-2.6.0 software release The 2.6.0 release of fwknop is available for download. This release incorporates a number of feature enhancements such as an AppArmor policy for fwknopd, HMAC authenticated encryption support for the Android client, new NAT criteria that are independently configurable for each access.conf stanza, and more rigorous valgrind verification powered by the CPAN Test::Valgrind module. A few bugs were fixed as well, and similarly to the 2.5 and 2.5.1 releases, the fwknop project has a Coverity defect count of zero. As proof of this, you can see the Coverity high-level defect stats for fwknop here (you'll need to sign up for an account): Coverity Scan Build Status I would encourage any open source project that is using Coverity to publish their scan results. At last count, it appears that over 1,100 projects are using Coverity, but OpenSSH is still not one of them.

Development on fwknop-2.6.1 will begin shortly, and here is the complete ChangeLog for fwknop-2.6.0:

  • (Radostan Riedel) Added an AppArmor policy for fwknopd that is known to work on Debian and Ubuntu systems. The policy file is available at extras/apparmor/usr.sbin/fwknopd.
  • [libfko] Nikolay Kolev reported a build issue with Mac OS X Mavericks where local fwknop copies of strlcat() and strlcpy() were conflicting with those that already ship with OS X 10.9. Closes #108 on github.
  • [libfko] (Franck Joncourt) Consolidated FKO context dumping function into lib/fko_util.c. In addition to adding a shared utility function for printing an FKO context, this change also makes the FKO context output slightly easier to parse by printing each FKO attribute on a single line (this change affected the printing of the final SPA packet data). The test suite has been updated to account for this change as well.
  • [libfko] Bug fix to not attempt SPA packet decryption with GnuPG without an fko object with encryption_mode set to FKO_ENC_MODE_ASYMMETRIC. This bug was caught with valgrind validation against the perl FKO extension together with the set of SPA fuzzing packets in test/fuzzing/fuzzing_spa_packets. Note that this bug cannot be triggered via fwknopd because additional checks are made within fwknopd itself to force FKO_ENC_MODE_ASYMMETRIC whenever an access.conf stanza contains GPG key information. This fix strengthens libfko itself to independently require that the usage of fko objects without GPG key information does not result in attempted GPG decryption operations. Hence this fix applies mostly to third party usage of libfko - i.e. stock installations of fwknopd are not affected. As always, it is recommended to use HMAC authenticated encryption whenever possible even for GPG modes since this also provides a work around even for libfko prior to this fix.
  • [Android] (Gerry Reno) Updated the Android client to be compatible with Android-4.4.
  • [Android] Added HMAC support (currently optional).
  • [server] Updated pcap_dispatch() default packet count from zero to 100. This change was made to ensure backwards compatibility with older versions of libpcap per the pcap_dispatch() man page, and also because some of a report from Les Aker of an unexpected crash on Arch Linux with libpcap-1.5.1 that is fixed by this change (closes #110).
  • [server] Bug fix for SPA NAT modes on iptables firewalls to ensure that custom fwknop chains are re-created if they get deleted out from under the running fwknopd instance.
  • [server] Added FORCE_SNAT to the access.conf file so that per-access stanza SNAT criteria can be specified for SPA access.
  • [test suite] added --gdb-test to allow a previously executed fwknop or fwknopd command to be sent through gdb with the same command line args as the test suite used. This is for convenience to rapidly allow gdb to be launched when investigating fwknop/fwknopd problems.
  • [client] (Franck Joncourt) Added --stanza-list argument to show the stanza names from ~/.fwknoprc.
  • [libfko] (Hank Leininger) Contributed a patch to greatly extend libfko error code descriptions at various places in order to give much better information on what certain error conditions mean. Closes #98.
  • [test suite] Added the ability to run perl FKO module built-in tests in the t/ directory underneath the CPAN Test::Valgrind module. This allows valgrind memory checks to be applied to libfko functions via the perl FKO module (and hence rapid prototyping can be combined with memory leak detection). A check is made to see whether the Test::Valgrind module has been installed, and --enable-valgrind is also required (or --enable-all) on the test-fwknop.pl command line.

Validating libfko Memory Usage with Test::Valgrind

Validating libfko Memory Usage with Test::Valgrind The fwknop project consistently uses valgrind to ensure that memory leaks, double free() conditions, and other problems do not creep into the code base. A high level of automation is built around valgrind usage with the fwknop test suite, and a recent addition extends this even further by using the excellent CPAN Test::Valgrind module. Even though the test suite has had the ability to run tests through valgrind, previous to this change these tests only applied to the fwknop C binaries when executed directly by the test suite. Further, some of the most rigorous testing is done through the usage of the perl FKO extension to fuzz libfko functions, so without the Test::Valgrind module these tests also could not take advantage of valgrind support. Now that the test suite supports Test::Valgrind (and a check is done to see if it is installed), all fuzzing tests can also be validated with valgrind. Technically, the fuzzing tests have been added as FKO built-in tests in the t/ directory, and the test suite runs them through Test::Valgrind like this:
# prove --exec 'perl -Iblib/lib -Iblib/arch -MTest::Valgrind' t/*.t
Here is a complete example - first, run the test suite like so:
# ./test-fwknop.pl --enable-all --include perl --test-limit 3

[+] Starting the fwknop test suite...

    args: --enable-all --include perl --test-limit 3

[+] Total test buckets to execute: 3

[perl FKO module] [compile/install] to: ./FKO...................pass (1)
[perl FKO module] [make test] run built-in tests................pass (2)
[perl FKO module] [prove t/*.t] Test::Valgrind..................pass (3)
[valgrind output] [flagged functions] ..........................pass (4)

    Run time: 1.27 minutes

[+] 0/0/0 OpenSSL tests passed/failed/executed
[+] 0/0/0 OpenSSL HMAC tests passed/failed/executed
[+] 4/0/4 test buckets passed/failed/executed
Note that all tests passed as shown above. This indicates that the test suite has not found any memory leaks through the fuzzing tests run via Test::Valgrind. But, let's validate this by artificially introducing a memory leak and see if the test suite can automatically catch it. For example, here is a patch that forces a memory leak in the validate_access_msg() libfko function. This function ensures that the shape of the access request conforms to something fwknop expects like "1.2.3.4,tcp/22". The memory leak happens because a new buffer is allocated from the heap but is never free()'d before returning from the function (obviously this patch is for illustration and testing purposes only):
$ git diff
diff --git a/lib/fko_message.c b/lib/fko_message.c
index fa6803b..c04e035 100644
--- a/lib/fko_message.c
+++ b/lib/fko_message.c
@@ -251,6 +251,13 @@ validate_access_msg(const char *msg)
     const char   *ndx;
     int     res         = FKO_SUCCESS;
     int     startlen    = strnlen(msg, MAX_SPA_MESSAGE_SIZE);
+    char *leak = NULL;
+
+    leak = malloc(100);
+    leak[0] = 'a';
+    leak[1] = 'a';
+    leak[2] = '\0';
+    printf("LEAK: %s\n", leak);

     if(startlen == MAX_SPA_MESSAGE_SIZE)
         return(FKO_ERROR_INVALID_DATA_MESSAGE_ACCESS_MISSING);
Now recompile fwknop and run the test suite again, after applying the patch (recompilation output is not shown):
# cd ../
# make
# test
# ./test-fwknop.pl --enable-all --include perl --test-limit 3
[+] Starting the fwknop test suite...

    args: --enable-all --include perl --test-limit 3

    Saved results from previous run to: output.last/

    Valgrind mode enabled, will import previous coverage from:
        output.last/valgrind-coverage/

[+] Total test buckets to execute: 3

[perl FKO module] [compile/install] to: ./FKO...................pass (1)
[perl FKO module] [make test] run built-in tests................pass (2)
[perl FKO module] [prove t/*.t] Test::Valgrind..................fail (3)
[valgrind output] [flagged functions] ..........................fail (4)

    Run time: 1.27 minutes

[+] 0/0/0 OpenSSL tests passed/failed/executed
[+] 0/0/0 OpenSSL HMAC tests passed/failed/executed
[+] 2/2/4 test buckets passed/failed/executed

This time two tests fail. The first is the test that runs the perl FKO module built-in tests under Test::Valgrind, and the second is the "flagged functions" test which compares test suite output looking for new functions that valgrind has flagged vs. the previous test suite execution. By looking at the output file of the "flagged functions" test it is easy to see the offending function where the new memory leak exists. This provides an easy, automated way of memory leak detection that is driven by perl FKO fuzzing tests.
# cat output/4.test
[+] fwknop client functions (with call line numbers):
       10 : validate_access_msg [fko_message.c:256]
        6 : fko_set_spa_message [fko_message.c:184]
        4 : fko_new_with_data [fko_funcs.c:263]
        4 : fko_decrypt_spa_data [fko_encryption.c:264]
        4 : fko_decode_spa_data [fko_decode.c:350]
Currently, there are no known memory leaks in the fwknop code, and automation built around the Test::Valgrind module will help keep it that way.

Port Knocking: Why You Should Give It Another Look

Port Knocking: Why you Should Give It Another Look (Update 10/20/2013: There is a Reddit comment thread going on this post here.)

It has been a decade since Port Knocking was first introduced to the security community in 2003, so it seemed fitting to recap how far the concept has evolved. Much effort has gone into solving architectural problems with PK systems, and today Single Packet Authorization (SPA) embodies the primary benefits of PK while fixing its limitations.

There are noted security researchers on both sides of the debate as to whether PK/SPA has any security value, but it is interesting that researchers who don't find value seem to concentrate on aspects of PK/SPA that have little to do with the chief benefit: cryptographically strong concealment. At least, this is the property offered by Single Packet Authorization but admittedly not necessarily with Port Knocking. Let's first go through some of the more common criticisms of PK/SPA, and show what the SPA answer is to each one. For those that haven't considered SPA in the past, perhaps it is time to give it a second look if for no other reason than to propose a method for breaking it.

1) Isn't PK/SPA just another password?

Suppose I hand you an arbitrary IP, say 2.2.2.2 that is running a default-drop firewall policy. As an attacker, you scan 2.2.2.2 and can't get any information back whatsoever. It doesn't respond to pings, Nmap cannot detect any TCP or UDP service under all scanning techniques, and any Metasploit module that relies on a TCP connect() call is ineffective. In the absence of a routing issue, it is safe to assume there is a firewall or ACL blocking all incoming scans. It is not feasible to tell whether there are any services listening (SSH or otherwise), and it also not feasible to tell whether there is a PK/SPA daemon either - the firewall is being used to do what it does best: block network traffic. The point is that there is no information coming back from the target. A PK/SPA daemon may be deployed, but the passive nature of PK/SPA makes it undiscoverable [1].

As a thought experiment, what would it take to make PK/SPA "just another password"? Well, if the PK/SPA daemon listened on a TCP socket and advertised itself via a server banner (like "fwknop-2.5.1, enter password for SSH access:") this would go a long way. Then Nmap would once again become an effective tool for finding the PK/SPA daemon, and an attacker could start to try different passwords. In other words, the daemon would no longer be passive, which is the whole point of PK/SPA to begin with.

But wait, you might say "but attackers can just try to brute force passive PK/SPA daemons anyway (even though they can't scan for them directly) and see if a port opens up", which brings us to:

2) Can PK/SPA be brute-forced?

Port Knocking implementations that use simple shared sequences are certainly vulnerable to brute forcing, and they are also vulnerable to replay attacks. These vulnerabilities (among other problems) were primary motivators for the development of SPA, and any modern SPA implementation is not vulnerable to either of these attacks. For example, fwknop uses AES in CBC mode authenticated with an HMAC SHA-256 in the encrypt-then-authenticate model, and both the encryption and HMAC keys (256 and 512 bits respectively for a total of 768 bits) are generated from random data in --key-gen mode. Further, fwknop can leverage GnuPG instead of AES, and 2048-bit GnuPG keys are fully supported. If it were practical to brute force fwknop encryption and authentication, then it would also be practical to brute force a lot of other cryptographic software too. Hence, fwknop is not vulnerable to such attacks in any practical sense [2].

Beyond this, from 1) above remember that the very existence of the SPA daemon is not discoverable by an attacker. For the average adversary, interacting with the SPA daemon must be done blindly "by chance". So, the target - even if it is running an SPA daemon which the attacker can't see - which implementation should the attacker try to brute force? If the target is running Moxie Marlinspike's knockknock (which uses AES in CTR mode authenticated with a truncated HMAC SHA-1), then the attacker needs to try and brute force the daemon with crafted TCP SYN packets via the following fields: TCP window size, TCP sequence, TCP acknowledgement, and the network layer IP ID. On the other hand, if the target is running fwknop, then the attacker would have to try and brute force the fwknopd daemon with UDP payloads to a port that the fwknopd pcap filter statement allows (although fwknopd can also be configured to only accept SPA payloads over ICMP instead). Should the attacker try to brute force the fwknop AES-CBC + HMAC SHA-256 mode? Or the GnuPG + HMAC SHA-256 mode? Further, the fwknopd daemon can place restrictions on the services that an authenticated client is authorized to request via the access.conf file. There are a lot of bits adding up, and the entire time the attacker doesn't even know whether an SPA daemon is actually running let alone which one.

Unfortunately for the attacker it gets worse. Even if the attacker could somehow brute force both the encryption and authentication steps in fwknop or other SPA software, to which service should the attacker try to make a connection? No service has to listen on the default port, so if a connection to SSH isn't answered should the attacker scan the target looking for a service? Maybe SPA is being used to conceal an IMAP daemon, a webserver, or OpenVPN instead of SSHD? Further, because the SPA daemon never acknowledges anything to begin with, the attacker can only infer that a brute force attempt was successful by seeing if a service is finally available after each attempt. So, in order for the attacker to be effective, the work flow for every brute force attempt should be: (1) brute force attempt, (2) scan for SSH (just because SPA is usually used to conceal SSH), (3) full scan if the SSH scan doesn't work. This starts to become extremely noisy to say the least. Even if full scans aren't also used, the volume of traffic just to attempt brute force operations by themselves is prohibitively huge.

Regardless of the attacker work flow, which service is concealed, or how heavily the attacker scans the target after every attempt, the brute force resistance offered by fwknop is fundamentally provided through the strength of cryptography. Getting past the authentication step alone would require breaking the 512-bit HMAC SHA-256 key (or forcing a hash collision against SHA-256), and fwknop even supports HMAC SHA-512 if one prefers that instead. Beyond this, the encryption key would also need to be brute-forced. For all intents and purposes it is not practical to brute force fwknop, and similar arguments apply to other SPA software.

3) Does PK/SPA add intolerable complexity?

Every security measure has some associated complexity. Firewalls add complexity. Encryption adds complexity. SSH itself adds complexity. If complexity were always the trump card against higher levels of security, then people would connect admin shells to sockets directly via netcat (or just run telnet) and not worry about encryption. People would not run firewalls because vanilla IP stacks without firewalling hooks would be simpler. Filesystem permissions overlays would be considered insecure because they add complexity. Obviously such viewpoints don't pass muster in the real world. The point is that using more complex code sometimes enables higher levels of security against widely understood threat models. For example, people use SSH and SSL because they want authentication and confidentiality over untrusted networks. Firewalls are used to reduce the attack surface that an adversary can easily communicate with. Application and/or filesystem layer policy controls are engineered to place restrictions on classes of users. The list continues.

This is not to say that complexity is not an important consideration - far from it. Rather, a real security benefit must be realized in order to justify increasing the complexity of a system. In the SPA community, we assert there is a security benefit afforded by passive, cryptographically strong service concealment. How would an attacker try to brute force user passwords via SSH when it is concealed by SPA? How would an attacker exploit even a zero-day vulnerability in a service protected by SPA? How would an attacker exploit a vulnerability in the SPA daemon itself when it is indistinguishable from a system that is running a default-drop firewall policy? (An attacker may interact with the SPA daemon, but it is more or less "by chance" [3].)

In the context of PK/SPA, the real issue is whether the complexity of the code that an attacker can interact with is more or less when PK/SPA is deployed. Anyone protecting networked services is probably already running a firewall, so the firewall usage by the PK/SPA software isn't adding to complexity that wasn't already there. Next, if PK/SPA is used to conceal multiple services (say, SSH, an IMAP daemon, and a webserver all at the same time), a would-be attacker cannot interact with any of those code bases without first getting past the PK/SPA daemon. It is a good bet that the complexity of the PK/SPA daemon is a lot less than the aggregate complexity of all three of those services if they were open to the world. Further, this may still be true even for a single daemon such as SSH as well. In essence, the effective complexity of code that an attacker can interact with may actually go down with PK/SPA deployed - that is, until the PK/SPA daemon can be circumvented (and hence a method for doing this becomes an important question for an attacker).

4) What happens if the PK/SPA daemon dies?

This is a solved problem. Process monitoring software has been around for decades. Many options exist for any OS on which a PK/SPA daemon is deployed. For example, on Ubuntu systems fwknopd is monitored by upstart. Having said this, fwknopd is extremely stable anyway so this feature is hardly ever needed in practice. Still, it is certainly important to ensure that PK/SPA usage does not cause a single point of failure, so using process monitoring software is a good idea.

Summary

There are other criticisms of SPA that are not included in this blog post, and certainly some of them are legitimate such as the fact that SPA requires a specialized client to access concealed services and the fact that "NAT piggybacking" is possible for users on the same network from which an SPA client is used when behind a NAT. However, these points don't generally rise to the level that they invalidate the SPA strategy. This blog post attempts to address those criticisms that could rise to this level were it not for the effort that has gone into solid SPA design by fwknop and other projects. More information on the design goals that guide fwknop can be found in the project tutorial.

In conclusion, SSL uses cryptography to provide authentication and confidentiality, Tor uses cryptography to provide anonymity, and SPA uses cryptography to conceal service existence. For those that assert there is no security value in the later strategy, it should consequently not be difficult to circumvent. To those in this camp, given the material in this post, please propose a method for breaking SPA.

[1] At least, a PK/SPA daemon is not discoverable by attackers who aren't already in a privileged position to sniff all traffic to and from the target. Clearly, most attackers - including password-guessing botnets - do not fall into this category.

[2] It is possible to weaken the security of fwknop SPA communications by not using --key-gen mode to generate random encryption and HMAC keys and thereby make them more susceptible to brute force attacks. However, this type of problems similarly affects other cryptographic software so it isn't unique to fwknop. And, even if a user doesn't use --key-gen mode, it is still not as easy to brute force fwknopd (which never confirms its existence to an attacker) as other software which an attacker can readily see is available to exploit.

[3] The security of fwknopd code itself is nonetheless quite important, and this is why the fwknop project uses static analysis provided by Coverity (and has a Coverity scan score of zero), the CLANG static analyzer, and also implements dynamic analysis with valgrind via a comprehensive test suite.

TCP Options and Detection of Masscan Port Scans

After Errata Security scanned port 22 across the entire Internet earlier this month, I thought I would go back through my iptables logs to see how the scan appeared against one of my systems. Errata Security published the IP they used for the scan as 71.6.151.167 so that it is easy to differentiate their scan from all of the other scans and probes:
[minastirith]# grep 71.6.151.167 /var/log/syslog | grep "DPT=22 "
Sep 12 21:19:15 minastirith kernel: [555953.034807] DROP IN=eth0 OUT= MAC=00:13:46:11:11:11:78:cd:11:6b:11:7e:11:00 SRC=71.6.151.167 DST=1.2.3.4 LEN=40 TOS=0x00 PREC=0x20 TTL=241 ID=17466 PROTO=TCP SPT=61000 DPT=22 WINDOW=0 RES=0x00 SYN URGP=0
Interestingly, the SYN packet that produced the log message above does not contain TCP options. The LOG rule in the iptables policy was built with the --log-tcp-options switch, and yet the OPT field for TCP options is not included. Looking through the Masscan sources, TCP SYN packets are created with the tcp_create_packet() function which does not appear to include code to set TCP options, and neither does the default template used for describing TCP packets. This is most likely done in order to maximize performance - not from the perspective of the sender since a static hard-coded TLV encoded buffer would have done nicely - but rather to minimize the time that scanned TCP stacks must spend processing the incoming SYN packets before a response is made. While this processing time is trivial for individual TCP connections, it would start to become substantial when trying to rapidly scan the entire IPv4 address space.

A consequence of this strategy is that SYN packets produced by Masscan look different on the wire from SYN packets produced by most operating systems (at least according to p0f), and they also differ from SYN scans produced by Nmap (which do include options as we'll see below). This is not to say that every SYN packet without options necessarily comes from Masscan. There are operating systems in the p0f signature set (such as Ultrix-4.4) that do not include options, and the Scapy project also seems not to set options by default when producing SYN scans like this. In addition it looks like Zmap also does not include TCP options in SYN scans. For reference, here are three iptables LOG messages for SYN packets produced by a standard TCP connect() call from an Ubuntu 12.04 system, an Nmap SYN (-sS) scan, and Scapy (source and destination IP's and MAC addresses have been obscured):
### TCP connect() SYN:
Sep 29 21:16:00 minastirith kernel: [171470.436701] DROP IN=eth0 OUT= MAC=00:13:46:11:11:11:78:cd:11:6b:11:7e:11:00 SRC=2.2.2.2 DST=1.2.3.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=15593 DF PROTO=TCP SPT=58884 DPT=12345 WINDOW=14600 RES=0x00 SYN URGP=0 OPT (020405B40402080A0CE97C070000000001030306)

### Nmap SYN (-sS):
Sep 29 21:16:12 minastirith kernel: [171482.094163] DROP IN=eth0 OUT= MAC=00:13:46:11:11:11:78:cd:11:6b:11:7e:11:00 SRC=2.2.2.2 DST=1.2.3.4 LEN=44 TOS=0x00 PREC=0x00 TTL=39 ID=26480 PROTO=TCP SPT=48271 DPT=12345 WINDOW=4096 RES=0x00 SYN URGP=0 OPT (020405B4)

### Scapy SYN via: sr1(IP(dst="1.2.3.4")/TCP(dport=12345,flags="S"))
Sep 29 21:35:15 minastirith kernel: [172625.207745] DROP IN=eth0 OUT= MAC=00:13:46:11:11:11:78:cd:11:6b:11:7e:11:00 SRC=2.2.2.2 DST=1.2.3.4 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=1 PROTO=TCP SPT=20 DPT=12345 WINDOW=8192 RES=0x00 SYN URGP=0
As a result, we can infer that SYN scans without options may originate from Masscan (the Scapy example not withstanding since Scapy's usage as a port scanner is a probably a lot less common than Masscan usage), and this will become more likely true over time as Masscan's popularity increases. (It is included as a tool leveraged by Rapid7's Sonar project for example.)

The upcoming 2.2.2 release of psad will include detection of scans that originate from Masscan, and if you check out the latest psad sources from git, this feature has already been added. To make this work, the iptables LOG rule needs to be instantiated with the --log-tcp-options switch, and a new psad.conf configuration variable "EXPECT_TCP_OPTIONS" has been added to assist. When looking for Masscan SYN scans, psad requires at least one TCP options field to be populated within a LOG message (so that it knows --log-tcp-options has been set for at least some logged traffic), and after seeing this then subsequent SYN packets with no options are attributed to Masscan traffic. All usual psad threshold variables continue to apply however, so (by default) a single Masscan SYN packet will not trigger a psad alert. Masscan detection can be disabled altogether by setting EXPECT_TCP_OPTIONS to "N", and this will not affect any other psad detection techniques such as passive OS fingerprinting, etc.

Design of a New 'xbits' Cross-Stream IDS Keyword

Snort and Suricata xbits implementation In the previous blog post a proposal was made for a new Snort and Suricata keyword "xbits" for cross-stream signature matching. This post had little discussion of implementation tradeoffs, and some have reacted to the blog post by saying that it is difficult to properly design and implement the type of cross-stream state tracking that would be necessary for xbits to work. I agree. However, the initial xbits proposal did not assume that such an implementation would be easy or straightforward, and this blog post will attempt to illustrate where some of the pitfalls are likely to be. In the end, I'm confident that is possible to develop something similar to xbits. Victor Julien, lead developer of Suricata, commented on my Google+ posting on the xbits keyword to state that Suricata has been considering implementing something similar to xbits for a while.

First, before diving into xbits itself, I would argue that there is already precedent in IDS/IPS engines for detecting an important class of communications that cross multiple transport layer "conversations" (using this term loosely for a moment): port scans and sweeps. Detecting such traffic is mostly about setting thresholds on various things such as the number of ports and IP addresses that are contacted within a given period of time, prioritizing on sets of ports that are usually associated with important services (with sweeps for certain ports sometimes spiking after a new vulnerability is discovered), and differentiating TCP flags that are used (a TCP FIN scan looks a lot different than a connect() scan and indicates some things about the adversary such as privileged OS access). By definition, the raw ability to differentiate scans and sweeps vs. normal traffic requires the capability of keeping some state across transport layer conversations. It just so happens that Snort and Suricata track this state within dedicated preprocessors and do not also expose port scan detection configuration in the signature language itself. By contrast, other preprocessors do offer signature language interfaces such as stream5 with the flow keyword. To be clear, I'm not at all advocating that configuration aspects of the sfPortscan preprocessor actually belong in the signature language (that would be unnatural to say the least) - I'm merely making the point that the idea of maintaining some state across transport layer conversations is something that Snort and Suricata already do. So, in this area at least, such a concept is not foreign.

Traffic Visibility

Now, in terms of factors that would affect an xbits implementation, it should be noted that not every IDS/IPS necessarily has a global view of all traffic on a given network. This can be the result of several different factors that depend not only on the physical hardware, but also how the IDS/IPS itself is developed (multi-threaded or not), and configured. Starting with the hardware, I've seen major IDS deployments that require multiple IDS appliances working together in order to inspect all of the network traffic. Essentially the IDS appliances form a cluster of systems (not in the HPC sense) where portions of the traffic are split across each appliance with a device such as a Gigamon tap. This allows each IDS appliance to handle a fraction of the traffic that it would otherwise have had to inspect, and this in turn enables the cluster as a whole to scale to massive amounts of network traffic. But, this also means portions of the traffic are physically separated from one IDS to the next, and therefore xbits on a single IDS can only apply to the set of transport layer conversations that actually traverse it. So, is there an opportunity for an attacker to evade xbits when deployed in this fashion? Sure, if the attacker knows how the traffic splitting is done, then attacks could most likely be sent against systems in ways that nullify xbits criteria just by using different source networks to force each individual IDS appliance into having only a limited view of a cross-stream attack. There are potential evasions at every level, but many attackers are not going to have access to such traffic splitting details.

Other IDS/IPS architectures such as those that rely on network processors or specialized packet acquisition hardware can also result in limited traffic visibility. Some organizations run multiple Snort processes on a single appliance, and have a network processor split packet data based on IP network ranges or transport layer port number across the Snort instances. Once again, each Snort process has a limited few of the traffic. Similarly, in a multi-threaded IDS/IPS such as Suricata, each thread may be tasked with processing a portion of the total network traffic, and this can result in a limited view within software even if there is no hardware enforced traffic splitting mechanism.

The Stream Preprocessor

Modifying the stream preprocessor to handle the xbits keyword could be tricky. By its nature, xbits would force the stream preprocessor to consider information that is not derived from single transport layer conversations, so locking issues against a global xbits tracking data structure in a multi-threaded context would become important. Also, it would be nice to not place onerous restrictions on xbits such as requiring that a connection close before a set xbit can be tested within a different connection. My guess is that stream preprocessor modifications have previously been a barrier to implementing something like xbits.

xbits Design

Given all of the above, what would be the ideal xbits design? Stepping back for a moment, when any signature language feature is implemented in an IDS, what should be the primary goal? Better attack detection. Performance is certainly a consideration too, and performance features sometimes bleed into the signature language (see the fast_pattern keyword for example), but usually a new signature language feature is added because it enables better detection of threats at acceptable performance levels. Further, some features of the language are important enough to expend lots of CPU cycles and consume precious memory anyway because the detection accuracy would be significantly harmed without them - see the pcre keyword for example. So, we should strive for the ideal xbits design from a detection perspective and let performance and other tradeoffs take place where they must:

  1. Allow an xbit to be set on one transport layer conversation and inspected in a different conversation before the first is closed.
  2. Allow an xbit to be set on a conversation that involves one IP protocol, and tested in a conversation that involves a different IP protocol. E.g. set a bit on a UDP flow and test it in a subsequent TCP connection.
  3. Interface with the current stream preprocessor to allow the setting and testing of xbits to take advantage of existing connection tracking capabilities. This most likely can be implemented as an extension to stream5 without requiring a wholly new preprocessor.
  4. For multi-threaded intrusion detection engines such as Suricata, some of the same tradeoffs that allow port scan detection to apply across threads could be used for xbits. Ideally, xbits would not be limited to traffic that is seen within a single thread.

Crossing the Streams in IDS Signature Languages

Snort/Suricata xbits proposal This blog post is a proposal for a new SNORT®/Suricata keyword "xbits" that could change how IDS signature developers approach detection of exploits that cross multiple streams. Today, in both Snort and Suricata, it is possible to build up a state machine out of a set of related signatures with the flowbits keyword in order to track how an exploit progresses through a vulnerable application layer protocol. This is an important feature, and is used in many standard Snort signature sets - as of this writing it is in about 6% of all active Snort rules in the Emerging Threats rule set. However, flowbits has an important limitation: it can only apply within single TCP connections or single UDP conversations (forward and reverse flows). So, it is not possible to set a flowbit on one TCP connection and then test whether this flowbit is set in a completely separate connection. This limitation represents an opportunity for innovation, and it is my belief that this shortcoming partially helped to fuel the demand for products offered by SIEM vendors (more on this below).

flowbits Example

First, let's see an example of flowbits usage within two Snort signatures - this comes from the online Snort manual. The following Snort signatures (or "rules" if you like) show a basic example of tracking the IMAP protocol and generate an event for the IMAP "LIST" command but only after a successful login:
alert tcp any 143 -> any any (msg:"IMAP login"; content:"OK LOGIN"; flowbits:set,logged_in; flowbits:noalert;)
alert tcp any any -> any 143 (msg:"IMAP LIST"; content:"LIST"; flowbits:isset,logged_in;)
So, the first signature sets the flowbit "logged_in", and then the second tests whether this flowbit has been set on the same TCP connection along with the "LIST" content match. Note the "noalert" modifier suppresses the alert from the first rule and triggering it merely signals internally to Snort that a necessary precondition for the second signature has been met. It is the second signature that causes an alert to be generated if it is triggered. Note also that the "logged_in" flowbit is set on traffic returned to the client from the IMAP server vs. the "isset" criteria which tests the logged_in flowbit in the second rule on traffic coming from the client. This illustrates the ability of flowbits to place match criteria on communications emanating from both sides of a connection.

Why is flowbits useful?

flowbits is important because it allows the signature developer to let network traffic progress in a natural way and have Snort/Suricata track it as it unfolds. It lets sets of signatures work together as group for more reliable exploit detection, and there are plenty of exploits that can be detected using the current flowbits implementation. However, what types of exploits are missed because flowbits cannot apply across multiple TCP connections? Put another way, would detection of malicious traffic be significantly improved with a cross-stream flowbits keyword?

A New Keyword: xbits

The proposed xbits keyword would function as follows:

  • Fundamentally, an xbit could be set on one TCP connection or UDP conversation, and tested in a different connection or conversation. This would require a different interface to the stream tracking portions of Snort and Suricata than currently implemented by flowbits.
  • All xbits semantics would match those in flowbits for existing flowbits modifiers such as set, unset, noalert, etc.
  • xbits would offer a new modifier "track" that accepts arguments "ip_pair" (to associate xbits by pairs of IP addresses), and "expire" (to allow xbits to be cleared automatically after a specified number of seconds).
This blog post is not to say that existing Snort rules that use flowbits don't work properly, or that flowbits is fundamentally flawed. Rather, in some cases these rules could be made more reliable and harder to evade with the proposed xbits keyword. It is the sequence of connections that look a certain way that is important, and xbits tries to promote this concept directly into the Snort signature language. Without xbits, the information conveyed by such sequences is lost. One could imagine other cases where having xbits would be useful such as:

  1. Use of a compromised system after successful exploitation. An attacker sends an exploit against a system where the exploit itself is difficult to detect, but following the exploit connection a new successful connection is made to a backdoor port that the exploit forces the compromised system to open (or a connect-back shell can be initiated the other way - the detection rules can be written to take either scenario into account). If the exploit itself can only be described by a signature that may also produce unacceptably high rates of false positives on legitimate traffic, then xbits provides an alternative since this rule never has to trigger any alert at all. Only the use of the compromised system after successful exploitation causes an alert.

    Now, why not just have a signature that triggers on every connection to the backdoor port? Well, sure, but if xbits existed then a set of higher confidence signatures related together by xbits can be created for the same thing. Once again, the sequence of communications contains information that is important for better exploit detection. Further, nothing prohibits both strategies from being used simultaneously; existing non-xbits signatures can be used at the same time.

  2. Better detection of Metasploit traffic, and by extension better detection of sophisticated adversaries. Here are a few examples of Metasploit modules that require multiple streams for successful exploitation:

                - SCADA 7-Technologies IGSS Rename Overflow
                - Apache ISAPI DoS
                - ContentKeeper Web Remote Code Execution

    There are many more modules that require multiple streams, and here is a quick way to identify those that may fall into this category (requires additional investigation). We just look for calls to connect(), connect_udp(), and send_request_cgi():
    $ git grep -c "^[[:space:]]*connect" modules | grep -v ":1" | wc -l
    125
    $ git grep -c "send_request_cgi" modules | grep -v ":1" | wc -l
    154
    

  3. Detection of network trickery that by its nature requires multiple streams. How about detecting SSH connections that have been authenticated with Single Packet Authorization? In this case, one needs a way to trigger an alert if a base64-encoded blob of data goes to UDP port 62201 followed closely after this by an SSH connection to the same system. Note that there are many ways SPA can deployed where this technique would not be effective (port randomization, SPA packet spoofing, sending SPA packets over Tor, and more), but still it is useful to consider how xbits could be applied to detect styles of communications that can't easily be expressed with current Snort/Suricata signature languages.

Metasploit Example

Let's examine the ContentKeeper Web remote code execution exploit mentioned above in a little more depth, and show how xbits can offer a detection alternative. This exploit attempts to escalate privilege and execute code as either the Apache or root user on the webserver as follows:

  1. Check for a vulnerable version of ContentKeeper by looking for a '500 Internal' error in response to issuing an HTTP request to /cgi-bin/ck/mimencode as seen here. This is an optional step (part of the Metasploit check() function for this exploit), but is generally a good idea since the Metasploit user probably does not want to upload a payload to a patched version of ContentKeeper (and thereby needlessly expand their own risk).
  2. Upload a base64-encoded perl script payload via an HTTP POST to the ContentKeeper webserver. Due to the vulnerability, the payload overwrites a specified file in the webserver filesystem. This connection is made to /cgi-bin/ck/mimencode as seen here.
  3. Wait three seconds after the HTTP upload connection is closed.
  4. Connect to the webserver via an HTTP GET and execute the uploaded payload script via /cgi-bin/ck/<script>. This step can be seen here.
In the context of this post, the important thing to note about the exploit steps above is that two separate HTTP connections are required - one to upload the payload via an HTTP POST, and the second to execute the payload via an HTTP GET. These two requests are required for exploitation regardless of whether the reconnaissance check is made in step one, though in our example rule set below we'll assume the recon check is issued as well. With the xbits keyword, the following signatures can detect this pattern of communications:

  1. Set xbit "Metasploit.ContentKeeper.recon" on initial HTTP connection in Metasploit step 1 above. Track by ip_pair.
  2. Test "Metasploit.ContentKeeper.recon" xbit with 'isset' and if it matches, then set xbit "Metasploit.ContentKeeper.recon_status_is_vuln" on '500 Internal' webserver response. Track by ip_pair.
  3. Look for an HTTP POST that uploads the base64 encoded perl script and test "Metasploit.ContentKeeper.recon_status_is_vuln" xbit. If this xbit is set, then set xbit "Metasploit.ContentKeeper.payload_uploaded" and track by ip_pair.
  4. Look for an HTTP GET to /cgi-bin/ck/ and test the "Metasploit.ContentKeeper.payload_uploaded" xbit. If it is set then generate an event "Metasploit ContentKeeper Web remote code exec".
The complete example rule set can be downloaded here. Obviously these won't run properly in Snort or Suricata until xbits is actually implemented. In this rule set a number of trade offs have been made such as: looking for the recon '500 Internal' error check, tracking by IP pair for the recon check and the exploit step, tracking by IP pair between the payload upload connection and the exploit connection, and allowing xbit values to expire after 30 seconds (see the xbits "track" criteria). The attacker is always free to slow things down and not use the same IP pair across each of these connections and still gain successful code execution. But, the attacker may not utilize different source IP's across these connections, and if not then the pattern above becomes a highly reliable indicator of a successful attack. Further, as mentioned previously, individual non-xbits rules can be deployed at the same time to catch each step by itself as well.

The SIEM Connection

At its core, xbits allows the IDS engine to process network traffic such that associations are made among groups of signatures in a manner that is not restricted to single TCP connections or UDP conversations. There is much analogous precedent for this concept in the SIEM world. SIEM vendors commonly allow security alerts to be built up from multiple independent sources of information such as syslog data, firewall logs, IDS events, webserver logs, netflow data, and more. A good example is "send an alert if source IP a.b.c.d triggers a port sweep event in my IDS, followed by SQL errors via my webserver, followed by a connection initiated back from the webserver to IP a.b.c.d". Having the SIEM automatically trigger an alert based on all of these events coming together in the specified order is far more valuable than trying to arrive at the same result through manual interpretation of each indicator by itself (which isn't practically possible for any decently large network).

This doesn't mean the raw event data is turned off or thrown away - far from it. The event data is simply processed by the SIEM in a way that gives priority to the sequence of events without necessarily caring about the sources of the events themselves (firewall log, etc.). Making an analogy to Snort/Suricata, the absence of xbits would be like a SIEM that could only generate an alert based on looking at one event source at a time. I.e., SSH brute force attempts logged via syslog could not be correlated with, say, a port sweep event from an IDS.

Further, in the absence of xbits, in the Metasploit example above a set of regular Snort rules could be deployed to detect each stage of the Metasploit traffic individually and log this data to a SIEM. From there, the SIEM itself could implement the same logic as xbits does - i.e. generate an alert if the raw events come together in the same sequence as what the xbits rules require. Given that this style of matching is useful, why not implement such a capability within the Snort signature language directly?

Performance?

The implementation of xbits would certainly impact performance, and this may be one reason neither Snort nor Suricata have developed something similar. However, some portion of xbits is clearly possible to implement with a reasonable set of constraints. These constraints may include limitations on the maximum number of xbits, enforcing a maximum amount of time that a set xbit is allowed to remain set, or limiting the number of comparisons that a signature is allowed to trigger for checking whether an xbit is set. It is my belief that a workable performance level could be achieved for many deployments of Snort/Suricata.

What About Shared Object (SO) Rules?

The whole point of binary shared object (SO) rules is to allow the signature author to alter the detection capabilities offered in the standard Snort signature language. However, implementing an SO rule is significantly more complex than just writing a regular Snort rule for most signature authors. If the concept of xbits could improve the detection landscape, then it would be desirable to have it built directly into the signature language itself where it is more accessible and readily applied to real world network traffic.

Why Not Just Extend flowbits?

Instead of creating a new Snort/Suricata signature language keyword, why not just extend flowbits so that it can apply to multiple transport layer conversations? Because maintaining backwards compatibility with existing Snort rule sets would become difficult and error prone. Matching across mutiple conversations is such a fundamentally different model that it would be better to clearly differentiate this capability with a new keyword.

Conclusion

This blog post proposes an extension to the Snort/Suricata signature language to allow those IDS engines to detect malicious traffic that crosses multiple streams. Attackers do not limit themselves to traffic patterns that must stay within a single transport layer conversation, and neither should standard intrusion detection engines. One only needs to look at the Metasploit project for excellent examples of exploits that span multiple streams. For the record, I had proposed this idea on a panel discussion with Ron Gula and Marty Roesch at the SANS "What Works in Incident Detection and Log Management Summit" in 2010. At the time I didn't use the "xbits" name, but (to me) writing up this blog post has solidified xbits as a decent name for the concept.