Weekly Intelligence Summary Lead Paragraph: 2014-12-05

It’s been a week since news of an incident at Sony Pictures began to surface and new reports collected in that timespan show the company suffered a significant breach. According to multiple accounts the individuals behind the attack stole a trove of data, including internal documents, employees’ personal data and yet-to-be released films. The FBI issued an advisory regarding wiper malware that may be connected to the incident. Kaspersky, Symantec and Trend Micro each published additional intelligence on “Destover.” The link to Sony remains unconfirmed. There’s also speculation that North Korea, motivated by an upcoming Sony Pictures film mocking supreme leader Kim Jong-un, is responsible for the attack. However, there’s no official confirmation from Sony, law enforcement or FireEye (whose services the company retained) regarding that speculation. The VCIC’s more actionable intelligence collections this week include Cylance’s report on Operation Cleaver, a suspected Iranian group responsible for attacking critical infrastructure around the globe, as well as FireEye’s report on a group using phishing to steal insider financial information. And Brian Krebs was at it again this week after he announced Bebe Stores, Inc. suffered a suspected payment card breach. Unfortunately, the year of the point of sale breach continues.

When is an Intelligence Feed Record New?

A common question we grapple with when evaluating intelligence feeds is “If I see the same observable twice, what does it mean?”  This is probably, actually, two questions in one: “Is my feed sending me the same observation multiple times?” and “Is the second observation an observation of a single incident or a new incident?”

These are both tough questions to answer.  In the first case, the intelligence feed may not provide any indicator of uniqueness per record making it impossible to immediately tell if it is a duplicate or not.  The second question is even more complex.  Without significant context for the observation, there is no way to tell what caused it which would imply whether it was a second observation of a single incident or a new incident all together.

Ultimately, whichever question is being asked, the action question would be “Do I initiate new incident handling processes for the second record?”  This may be adding it again to detection systems, resetting detection timers, scanning the network for the observable, etc.

Let’s rephrase the question as a statistical question: “At what point is it statistically unlikely that the the second observation is related to the first?”  To answer this, we need to define what we mean by “what point”.  Effectively the feature of our data is the time between occurrences of an observable in our intelligence feed.  As such, “what point” refers to the time between the observation of an observable and it’s next observation.  We’ll use “days” to measure this, though if your feeds are updated frequently enough you may prefer ‘hours’.

Calculation

In reality, this is a fairly simple question to answer if you have a historical data store of the intelligence stream. To build our data set we use the following steps:

  1. Randomly sample the historical data store for a set number of observables, say 1000.
  2. Collect every observation of those observables for the intelligence feed.
  3. Sort the time series
  4. Calculate and store the number of days between each observation in a list.  This list will form our distribution of days between occurrences.

Once we have this list of days, the answer would normally be to find the value 3 standard deviations from the mean of the distribution.  However, we have an issue.  Because our values are temporal, they are not independent.  (I.e. when the next observation occurs probably depends on the previous observation.)  We can see this in the data as a clear power law probability distribution:

Distribution of Days Between Observation Occurrences
image-6385

This means the data is both long tailed and skewed.  As such the mean and standard deviation will not accurately represent the data.  (See Michael Roytman’s talk at bSidesLV for more information on long tailed distributions.)  Instead we use a robust estimate of scale.  We will use the τ estimate proposed by Maronna and Zamar in 2002 (Robust estimates of location and dispersion of high-dimensional datasets; Technometrics 44(4), 307–317).  In R, this code is available in the robustbase library.  If our distribution is stored in “D”, we can find our estimate of scale by running:

  1. install.packages(“robustbase”)
  2. library(robustbase)
  3. scaleTau2(D)

(If you would prefer python, I have transcoded the function here.)

The other issue we need to address is the use of the mean.  The outliers would significantly influence the mean.  As such, we use the geometric median.  Since our data is one dimensional, the geometric median is the same as the standard median.

So to find our cutoff, we take:

  • threshold <- median(D) + 3 * scaleTau2(D)

Or, if you prefer python:

  • from scipy import stats as scistats
  • import numpy as np
  • threshold = np.median(D) + 3 * scaleTau2(D)

The below list provides the descriptive statistics for the distribution in the histogram above:

  • Samples : 444
  • Mean : 12.4414414414
  • Mode : [2, 93]
  • First Quartile : 3.0
  • Second Quartile/Median : 6.0
  • Third Quartile : 10.0
  • Minimum : 2
  • Maximum : 159
  • Variance : 397.633958283
  • Std. deviation : 19.9407612263
  • Skew : 3.64549527587
  • Kurtosis : 16.0680960044
  • Outlier Threshold : 21.0636588967

We see that both the mean and standard deviation are influenced by the outliers.  Using them to calculate the cutoff would be roughly 72, 7 times the third quartile.  Instead, the Outlier Threshold of 21 provides a much more reasonable value.

Usage

Our usage for the threshold is when to consider an observation a new incident and when to treat it as a continuation of an existing incident.  With the threshold, it is easy.  If the number of days between observations of the observable is greater than the threshold, it is a new incident.  If not, it is a continuation of the old.  In addition, the threshold maybe provide clues about how long to keep looking for an observation after it has been reported on an intelligence feed.  Both usages provide a significant step forward in practical usage of intelligence feeds.

Special thanks to Rob Bird and Allison Miller who helped with some of the sticky statistics.

Weekly Intelligence Summary Lead Paragraph: 2014-11-28

The “Regin” espionageware platform dominated risk intelligence collections. Mashable published a good general summary of Regin. But the risk is almost certainly greater from the latest Adobe Flash vulnerability for Verizon Enterprise clients. Adobe released an out-of-cycle security bulletin and patch for Flash Player after F-Secure discovered the new vulnerability attacking via the Angler exploit kit (EK). Angler was also in last week’s INTSUM for exploiting a vulnerability the previous Flash security bulletin. Sony Pictures was the victim of the most significant data breach this week resulting in the company deciding to take their network down after an extortion attempt.

A Thought Experiment about Shared Credentials

Earlier this year the following question was posed to us:

“What is more likely to get compromised by an external attacker? One account with a strong password shared by 5 people or 5 accounts with strong passwords known only individually?”

The instinctive reaction is to shout the evils of shared passwords, but the specific question raised the degree of difficulty providing an answer. Internal misuse and accountability provided by unique user logins was not to be factored in. Continue reading

Weekly Intelligence Summary Lead Paragraph: 2014-11-21

Tuesday, Microsoft released MS14-068 out-of-cycle to mitigate a vulnerability in Kerberos that could be exploited to take over Windows domains.  The severity of the impact of a successful attack drove our recommendation for a 30-day deployment and pre-planning for a much shorter fuse if risk changes.  We’ve been collecting all the reliable intelligence we can regarding last week’s MS14-066 (SChannel). We have no reports of threats in the wild for it.  We can’t say the same for Adobe’s Flash Player bulletin from last week because Kafeine from DontNeedCoffee.com discovered the Angler EK is exploiting one of the 15 vulnerabilities from the bulletin. And ESET reported one of the two vulnerabilities patched last week by MS14-064 (OLE) was being exploited through IE for a drive-by-download on an Alexa 11,000 news site. So both vulnerabilities are being exploited in the wild. It doesn’t appear that attack used malvertisements, but the risk that enterprise users will encounter a malvertisement continues to grow.  Lastline Labs reported that 1% of ads served online are malicious and Trend Micro reported the Flashpack EK in malvertisements dropping Zeus, Dofoil and CryptoWall Trojans. To our colleagues in the U.S., the VCIC extends our wishes for a happy Thanksgiving holiday and hope the only thing all our clients will see from us for the rest of November is next week’s INTSUM.

Twitter and Information Security awareness

Twitter is giving traditional media a run for its money in many aspects, especially when it comes to getting the news out. Over the last few years a common pattern has emerged where news breaks first over Twitter or a comparable social media platform only to be picked up later by traditional media such as TV/Radio/Newspapers. In fact, most of the traditional media powerhouses have started incorporating social media in their portfolio both as means of reaching a younger tech savvy audience as well as receiving information about events as soon as they appear on social media. Twitter is by far the most popular choice of social network for breaking news as well as subsequent user/community interactions and for official corporate accounts to interact with the user base at large. With this in mind we analyze and assess the impact of Twitter when it comes to raising awareness of critical Information Security-related events.

The questions we’re tried to answer were…

  • How effective is the Twitter platform when it comes to raising awareness about InfoSec in general, and high profile events in specific?
  • Can InfoSec professionals and Organizations who face a constant uphill battle to keep up with what now seems like an endless barrage of computer/network attacks use Twitter to their advantage ?

On the whole, 2014 has already seen some very high profile vulnerability disclosures as well as data breaches. The intent of this exercise was to do a cross-sectional study of one such high profile vulnerability disclosure, the ’‘Shellshock vulnerability’, which made its appearance on Twitter and mainstream media in late Sept 2014. We use this particular vulnerability disclosure for our study because it was a very high profile disclosure with the potential to impact a lot of easy targets on the web and intranets (Don’t discount the internal threats !). Also this disclosure came on the heels of another high profile vulnerability disclosure, the ‘Heartbleed’.

Data at a glance

For this study we downloaded some 330,000 odd tweets using Twitter’s search API spanning the several weeks during which this vulnerability disclosure generated the most amount of activity both on social and mainstream media.
Due to the nature of Twitter conversations, where we have tweets, retweets, favorites, replies and combinations there of, it was interesting to look at the overall conversation map. This allowed us to gain a high level understanding of the scope of the conversations and the interactions that happened in those conversations.

Figure 1: Conversation Map

Figure 1: Conversation Map

Figure 1 shows a succinct view of the data we sampled. The numbers in the boxes show the number of tweets for that particular category (e.g. original tweets are tweets which are neither retweets nor replies). For comparison sake we also show what percentage each category takes up with respect to parent and grand-parent categories. What is clear is that there was not a whole lot of interaction here apart from retweeting. Not many back-and-forth conversations (replies and replies to replies < 3%), nor too many tweets got favorited (<10%). Of retweets, it was mostly the original tweets that tended to be retweeted rather than a subsequent reply to a tweet. All this pointed to the fact that the InfoSec twitter crowd is very close knit small crowd but not very interactive at least on the Twitter platform as far as discussing InfoSec events is concerned.

To elaborate on how minuscule this activity is, as far as high profile Twitter Activities go, we compared it with the July 9th 2014, Soccer World cup semi-final match between Germany and Brazil (which Germany won 7–1). That single match alone (lasting a little over 90 minutes) generated more that 35 Million tweets with a peak rate of approximately 580,000 tweets/minute (more than our entire sample size spanning several weeks). Although we don’t have a similar Conversation Map of that activity, the sheer number of tweets compared to our data gave us a good idea of how insignificant our ‘Shellshock’ activity was compared to a high profile sports related activity.

Note : The numbers were calculated from the data we downloaded using the Twitter search API. The search API does not provide all tweets but a sample of them (anywhere from 1% to 40% without indicating the true volume). The assumption here was that the sampling distribution is true enough to the distribution found in the total amount of tweets related to ‘shellshock’.

Timeline and Activity Trend

The one argument that Twitter has going for it over traditional media is that the speed with which news is received and propagated is very high compared to traditional media. There are various reasons behind such a claim, but let’s first see if there is any truth to this claim in the InfoSec world. Below we see two graphs. Figure 2 shows the per-day tweets and retweets related to ‘shellshock’ around the time when shellshock dominated the InfoSec news world, and Figure 3 is the google trend related to searches for the phrase ‘shellshock’ during that same time period.

Figure 2: Timeline of Tweets Related to shellshock

Figure 2: Timeline of Tweets Related to ‘shellshock’

Figure 3: Google Trend for shellsock

Figure 3: Google Trend for ‘shellsock’

Immediately what we saw was a remarkable similarity in the two trends. The peak of the activity was on Sept 25th & 26th and gradually declining over the next two weeks. We also saw a repetitive pattern of drop in activity over the weekends (27th Sat, 28th Sun and repeated on Oct 4th & 5th and later again on the 11th & 12th). This was a clear indicator that the InfoSec community and in general people who are interested in InfoSec was monitoring/searching for more information about ‘Shellshock’ and at the same time also interacting and exchanging information about it on Twitter, but this activity was mostly happening during work-days.

If we look at the timeline of creation dates of the accounts who participated in this conversation in Figure 4, we don’t see a huge spike in new user creation near the timeline of the event, which tells us that quite a lot of seasoned Twitter pros were engaged in the conversation.

Figure 4: User Account Creation Timeline

Figure 4: User Account Creation Timeline

Diversity

To analyze the diversity in the twitter communication we looked at the Top 10 Languages that were used to tweet in Figure 5 and Top 10 user timezones in Figure 6.

Figure 5: Top 10 Languages Tweeted in

Figure 5: Top 10 Languages Tweeted in

Figure 6: Top 10 Timezones in User Profiles

Figure 6: Top 10 Timezones in User Profiles

Unsurprisingly, English was the dominating language with Japanese and Spanish coming in at distant 2nd and 3rd. The United States of America (USA) and Europe dominated the top user timezones. This was expected given their high concentration of IT industries and as a consequence more awareness about InfoSec in these locations. What was surprising was how little Asia, APAC, South-America, & Africa contributed in the conversation. Does this point to relatively lower InfoSec awareness in these regions ? Or does it simply point to lack of popularity of Twitter in these regions ? Given that Twitter has a huge world wide user base, we suspect the former more than the later.

Twitter does provide a way for its users to individually geo-tag each tweet with a location, but out of the 330,000 or so tweets we sampled, less than 1,000 had this information available, so we didn’t deep dive in to per tweet location analysis.

References

Individual Tweets can have hashtags, refer to external URLs, and mentions of individual users. Analyzing this information gives us valuable insights about the internal and external resources used in these tweets.

Figure 7: Top 10 Hashtags in All Tweets

Figure 7: Top 10 Hashtags in All Tweets

Figure 7 shows the Top 10 Hashtags found in all tweets. The ‘shellshock’ hashtag took a very comfortable 1st spot. What’s more, the previous high profile vulnerability ‘heartbleed’ also got a place in the Top 10. Because the Top 10 hashtags in just the unique tweets (sans retweets) show similar distribution, we didn’t replicate it plot here. Figure 8 shows the top 10 URLs in all the tweets. It was a bit of a surprise to see a CNET URL getting the top spot. This was due to a single tweet that got retweeted 11K times to push that CNET URL to the top. That same tweet is mentioned at the start of this article.When we removed the retweets from the equation then the Top URL spot went to a very detailed explanation about the vulnerability by Troy Hunt.

Figure 8: Top 10 URLs in all Tweets

Figure 8: Top 10 URLs in all Tweets

Figure 9 shows the top 10 Users who were mentioned in the all the tweets. The user account ‘whsaito’ took the top spot on account of a single tweet that got retweeted almost 11K times. (The same one which has the CNET URL and which appears at the top of this article.)

Figure 9: Top 10 Users mentioned in all Tweets

Figure 9: Top 10 Users mentioned in all Tweets

Figure_10 shows the Top 10 active user accounts in terms of number of tweets twitted from those accounts. These accounts can be thought of as being most active in raising awareness about this vulnerability by tweeting about it multiple times.

Figure 10: Top 10 Active Twitter Accounts

Figure 10: Top 10 Active Twitter Accounts

To round out this discussion we also present the Top 10 Retweeted, Replied, and Favorited user accounts in terms of number of tweets in Figure 11

Figure 11: Top 10 retweeted/replied/favorited user accounts

Figure 11: Top 10 retweeted/replied/favorited user accounts

Interactions

We already looked at a high level interaction map in Figure 1. Now let’s deep dive in to it a bit. For starters let’s see how many followers our InfoSec twitter users tend to have. In Figure 12 below we show a kernel density plot of follower counts of each unique user involved in the data sample. What was very apparent is that InfoSec crowd is not very popular amongst Twitter user base. This is evident in that most of the accounts having less than 1,000 followers. But there are a few high profile accounts which have followers in the millions (the InfoSec rock-stars).

Figure 12: Follower Counts of Users

Figure 12: Follower Counts of Users

Figure 13: Retweeted and Favorited Counts of Tweets

Figure 13: Retweeted and Favorited Counts of Tweets

In Figure 13 above we show how many times unique tweets tend to be retweeted and favorited. This is a way to measure how popular InfoSec tweets tend to be. Most tweets don’t tend to be retweeted or favorited more than 100 times, a very small number especially when compared to some of the high profile trending activity on Twitter. This is a pity as it again points to lesser reach of InfoSec related Tweets.

Figure 14: Follower Counts v/s Retweet Counts and Favorite Counts

Figure 14: Follower Counts v/s Retweet Counts and Favorite Counts

For an interesting comparison we looked that the relationship between a user’s number of followers and whether it had any impact on a tweet being retweeted or favorited. Conventional wisdom would seem to suggest that there indeed should be some correlation there, i.e. the more the number of followers, the more the retweets or favorites, but we found evidence to the contrary. As Figure 14 seems to suggest that the less followers you have the higher the number of times your tweets get retweeted / favorited. One possible explanation for this contradiction is that we didn’t taken the PageRank effect into account, i.e. a tweet being retweeted by someone who has lots more followers than the original user account. This was left as an exercise for the future.

Conclusion

So what did we learn ? Twitter has the potential to reach vast amounts of users/organizations even outside the traditional InfoSec community to raise awareness about high profile security incidents or vulnerability disclosure. However as things stand now the InfoSec world is very close-knit and largely not popular outside its own sphere. InfoSec tweets about high profile events such as the ‘Shellshock’ vulnerability tend to reach a restricted and niche user base as opposed to say a high profile sports activity such as a Soccer world cup match or a super-bowl match. This was evident from the fact that most tweets related to shellshock were seen by, retweeted, replied, favorited by a very small fraction of the twitter user base.

If InfoSec organizations and professionals want to use twitter to their advantage, then they have their work cut out. They need to engage more and more people/Orgs from outside the core InfoSec community in the conversation. The more the conversation the more the awareness. InfoSec will never be as popular as Soccer or Super-Bowl. But if it manages to attract enough attention from non-InfoSec crowd it would be a step in the right direction.

Technical Notes

  • For the interested the data was downloaded using a python script and Twitter’s search API.
  • It was analyzed and plotted using R & ggplot2.
  • An interesting continuation of this analysis would be to put the data in a Graph Structure to explore the conversations in more detail, and see if we can discover any interesting clusters in the conversations.
  • Another possible research path is performing text analytics on the data for finding clusters based on words occurring in the tweets.

Weekly Intelligence Summary Lead Paragraph: 2014-11-14

The majority of intelligence collected by the VCIC this week could easily be organized into two categories: serious vulnerabilities and noteworthy attacks. Microsoft released its hefty November patch update on Tuesday, but the attention wasn’t on the cumulative Internet Explorer update or the patch for a second Windows OLE vulnerability that’s being exploited in a small number of attacks. The focus was on a remote code execution vulnerability in SChannel, which is Microsoft’s SSL/TLS implementation in Windows. Add it to the long list of crypto bugs we’ve seen this year and be sure to patch it too. Adobe also released a massive update to patch 18 vulnerabilities in Flash Player. Leading the noteworthy attack category is a report from Kaspersky on the Darkhotel APT, which gets its name by using hotel networks to target high-profile executives. Meanwhile two US government agencies, the United States Postal Service (USPS) and the National Oceanic and Atmospheric Administration (NOAA), announced they suffered attacks at the hands of Chinese threat actors. Australian news organizations also reported being breached by Chinese attackers in the buildup to the G20 Summit in Brisbane this weekend. This week’s must-read collections include Cyphort’s report on point of sale malware and volume 17 of Microsoft’s Security Intelligence Report.

Context Graph Based Analysis of Apple Pay Domains – Part 3 of 3

In our previous posts we identified Apple Pay domains created after the Apple Pay announcement here.  We then aggregated them in a context graph and analyzed the features of the graph here.  We then statistically analyzed the individual clusters here.  Companion posts explaining Verum, the context graph system, can be found here and here.  In this post we will manually validate the results of the previous analysis by looking at the individual clusters previously identified through statistical analysis.

Manual Cluster Validation

To this point in the analysis, everything can be automated.  Rather than manually analyzing the clusters that are potentially malicious, we could classify all of the infrastructure as malicious with a confidence derived from the various measures we looked at above.  The infrastructure atomic values (IPs, BGP prefixes, domains, etc) could be added to our IDS and other detection tools.  Once detected, our SIEM could use the confidence to prioritize incidents for investigation and response.

But let’s say we aren’t quite sure we trust our system yet.  Instead, we will now manually analyze the clusters we highlighted to validate that they represent malicious infrastructure.  Let’s first recap the clusters we hoped to look at:

  • Clusters with zero topic nodes. (We will take clusters with no nodes at distance zero, one, and two to make the set to analyze manageable.)
  • Clusters with greater than 1.5% IQR (or just over 16%) of nodes directly related to the malice node
  • Clusters above the 95% percentile of nodes two relationships from the malice node
  • Clusters with high aggregate malice scores, starting with the largest score and working our way down.
  • Clusters containing highly malicious nodes

To do this, let’s first subset our data to what we want.  We can use the following code to subset the dataset from our previous post.

Clusters Far from the Topic

Figure 12 Legend
image-6546

Figure 12 Legend

blog2_figure12_CORRECT
image-6547

Figure 12

We’ll start by looking at cluster 41. The cluster is the colorful cluster below and to the left of center of the three green nodes at the center of Figure 12. It is an IP address node surrounded by 4 record nodes that indicate that it is malicious. It is connected to an ASN, BGP prefix and a domain. The domain, (center green node with 2 close, green, neighbors), is not classified as malicious. From a malice standpoint the domain, (which connects to 21 topic nodes, colored black in Figure 12, through six IP nodes in the 4 other clusters) is just at the edge of the 1.5x IQR. This appears to be a malicious IP address that has a tangential connection to multiple other sets of infrastructure through a single domain that has moved between those infrastructures. Cluster 41 is most certainly malicious, but not necessarily related to our topic domains in the other clusters. Cluster 51 represents the same situation. A malicious IP address connected to the greater graph through a single domain. Same for clusters 69, 83, and 86. Rather than analyze all of the far-distance clusters, we will assume this pattern applies to the other clusters far from the topic nodes.  Next we will analyze clusters with significant percentages of nodes directly related to the malice node.

Clusters Connected to Malice

Figure 20
image-6548

Figure 20 – Cluster 270

Figure 13
image-6549

Figure 13 – Cluster 186

We will start with the cluster with the highest percentage of malice nodes, cluster 270. Upon inspection, it is two nodes, one of which is a malicious domain and the other is the record of the malicious classification. Same thing goes for 231, 134, 150, and the rest, (some having multiple records). A few are malicious IP addresses which then include ASN and BGP enrichments slightly diluting the concentration of maliciously classified nodes. These clusters are all malicious, but do not add any value over their original classification. We do have one cluster, 186 shown in Figure 13, which do not fit with the previous pattern. It contains two malicious nodes which share a BGP prefix and few other nodes. This should probably heighten our concern about that subnet, but not necessarily the entire cluster. The two purple nodes represent the malicious IP addresses with the associated BGP prefix represented by the yellow node in between. Hopefully the aggregated malice score will provide the necessary insight on the BGP prefix.

Clusters Two Relationships from Malice

Figure 14
image-6550

Figure 14 – Cluster 102

These clusters follow a similar pattern to those directly related to the malice node.  Starting with cluster 102 shown in Figure 14, we see it has a single malicious IP, an associated BGP prefix, and 12 records indicating the IP is malicious. The yellow node represents the center IP node with the red nodes indicating the records in the cluster. The blue node represents the domain that, while not in the cluster, is directly related to the malicious IP. The IP is still connected to a domain, which should be concerning due to potentially being malicious, however, the topic node’s malice score should indicate that. There are alternate clusters similar to cluster 186 profiled above in this group as well, however most follow the pattern of clusters directly connected to malice, except with IP addresses, (rather than domains), which have additional enrichments that increase the number of nodes two relationships from malice in the clusters.

Aggregate Malice Score

Aggregate malice score is the average malice score of all nodes within the cluster. The node malice score is a propagation of malice from the malice node through the graph. As such, it captures not just the direct relationship between a node and malice, but also the malice propagated through relationships from neighbors to the node. The most malicious cluster in this context graph only has 2 nodes: a record and a domain. That domain, however, is a malicious name server. This plays out for multiple clusters where a single, highly malicious name server node carries the malice for a rather small cluster. There are some interesting clusters however. Cluster 41, which popped up previously, shows up again as well as clusters 107 and 71. These tend to cluster with a highly malicious IP, which then points to a presumably malicious domain.

Figure 15 Legend
image-6551

Figure 15 Legend

Figure 15
image-6552

Figure 15 – Cluster 65

Cluster 65 is our first interesting cluster in this set. It has five IPs classified malicious and a whole host of information as can be seen in Figure 15. The size of the node in Figure 15 corresponds to its malice score.  The Five blue IP nodes are directly connected to the malice node. In this cluster, the large green node in the center of various other nodes lower left is a topic Apple Pay domain. This domain is highly likely to be malicious even though it is not directly classified that way by any of our enrichments or classifications.

This type of pattern continues as the clusters get less malicious. More nodes are added which decreases the cluster malice score, but the additional nodes reveal additional infrastructure and indicators associated with the known malice. Some contain a topic; others are linked to topics not within the cluster. These larger clusters provide the best means of classifying previously unclassified malice. These clusters have higher internal malice scores, more nodes classified as malicious, and are highly interconnected highlighting a collection of malicious infrastructure. All of this can be algorithmically identified and the new indicators fed back into our detection tools.

Figure 16
image-6553

Figure 16

Figure 16
image-6554

Figure 16 – Cluster 16

We will take a look at the largest cluster, cluster 16. Figure 16 shows this cluster with nodes colored by type other than maliciously classified nodes that are colored red and topic nodes that are colored black. In the upper right we see multiple malicious domains in a single BGP prefix/ASN. Connected to that cluster of malicious IPs are various domains. Those IPs and domains connect into a less malicious portion of the cluster in the lower left. Rather than classify the nodes in the lower section as malicious, we may classify them malicious with a lower confidence that the domains directly connected to the malicious infrastructure in the upper right. This does, however, validate what we suspected about large clusters with significant aggregated malice scores.

Malice Score of Individual Topic Nodes

Figure 18 Legend
image-6555

Figure 18 Legend

Figure 18
image-6556

Figure 18 – Cluster 0

As outlined above, 81 clusters contain the top 1734 most malicious nodes identified as outliers. We will look at a few of those here, focusing on the clusters containing the most malicious nodes. Unsurprisingly, both the most malicious and second most malicious node in our graph are within cluster 16. They are the two largest nodes in Figure 16, one red and one brown in the upper right. The third most malicious node lies within cluster 2. This cluster contains 1390 nodes and represents a shared infrastructure based on analysis of the centrality of the cluster. In the cluster centrality, a single *aaS provider’s ASN, name servers, and IP space are very central. As such, when threat actors use the shared infrastructure, the malice ‘collects’ in the graph on nodes representing the shared service provider’s services. Cluster 0 represents a similar cluster and can be seen in Figure 18. It is interesting in that it represents two clusters of shared infrastructure (large clusters left and right of center) with domains associated with both of them (nodes in center). The two are actually two registrars, one of which used to be a reseller for the other.

When sorted by most malicious individual node using the above line of python, the first dozen clusters all have more than 100 nodes each validating what we suspected:  that the shared infrastructure dilutes the overall malice but concentrates it on individual nodes.  Cluster 76 is the first with less than 100 nodes. This cluster also shows up in our aggregate malice score analysis.

Conclusion

In this series of blogs we have collected an enriched graph centered on multiple topic domains related to Apple Pay. We have scored individual nodes and clusters, identifying small clusters of malicious nodes, medium-sized clusters of infrastructure, and large clusters of publicly available shared infrastructure. We’ve classified clusters as well as individual nodes as potentially malicious with an associated confidence. We’ve done this both through automated means as well as through manual analysis. We have also included contextual data in our analysis so that analysts trying to make an assessment later have the data they need to do so. All of this allows the classification and investigation of malicious infrastructure, even before it acts maliciously or when it only indirectly exhibits it’s malice, in such a way that it can either be consumed by human analysts or by automated protection, detection, and response systems.

Addendum

While the graph database used for this analysis stores the temporal aspects of the nodes and relationships, functions to implement the algorithms to time phase the analysis through the graph has yet to be completed.  As such, the analysis within this blog post is done as if all classifications and enrichments occur at a single time. In reality, IP addresses, domains, and other nodes change classifications and relationships over time. In the future we will conduct analysis through the graph that respects the temporal aspect of the data. For now, understand that the lack of temporal phasing can skew the analysis.

Also, many large clusters exist in the graph representing publicly shared infrastructure (such as shared whois obfuscating services, domain registrars, IP space, and name servers). Cluster 2 is an example of this. The analysis correctly identifies these as non-malicious, however due to their size they contain subsets of malice. This can be corrected by rerunning modularity on the large cluster by itself.  (Modularity suffers from low resolution.)  Alternately, future research into subdividing these large infrastructure into smaller clusters which can then be individually classified may prove fruitful.

Weekly Intelligence Summary Lead Paragraph: 2014-11-07

Microsoft announced intentions to release sixteen security bulletins next week.  Sixteen is the most the company has released in one month since June 2011 and one under April 2011’s high water mark.  The VCIC dedicates extra effort to targeted attacks.  Not because they are currently prevalent among our clients, but because the methods that succeed today will almost certainly be used on Verizon Enterprise clients in the future.  This week those attacks include “TooHash” (GData), “Poisoned Handover” (FireEye), “BlackEnergy 2” (and 3 from Kaspersky) and “Rotten Tomato” (Sophos).  Crimeware continues to evolve as evidenced by a Cryptolocker campaign on Dutch users, several Rovnix campaigns, mostly in Western Europe, ROM, a new POS Trojan based on Backoff, and “WireLurker” impacting OS X and iOS systems, mostly in China.

Context Graph Based Analysis of Apple Pay Domains – Part 2 of 3

In our previous post, we looked at the initial creation and enrichment of a Context Graph centered around newly created Apple Pay domains.  We looked at the distribution of the Apple Pay topic throughout the graph.  In this post we will statistically compare and contrast individual clusters.  The companion post Introducing Verum: A Context Graph System – Part 2 of 2 provides additional insight into the Verum context graph system for those interested.

Cluster Analysis

To make the data easier to analyze with traditional means, I’ve provided a dataframe with the statistics for each cluster here. This dataframe was created by aggregating values stored on the individual nodes in the graph including the various topic and malice scores and distances.  I have normalized the values by dividing by the cluster order.  In the case of aggregate scores, the values are normalized to between zero and one.

Figure 8
image-6527

Figure 8

Cluster Topic Analysis

Our first feature of analysis will be the percentage of nodes in each graph at varying distances from the topic nodes.  This data is plotted as a boxplot in figure 8. If you are unfamiliar with boxplots, I recommend reading up on them here. They are a very efficient way of visualizing non-normally distributed data. In our case, the whiskers represent 1.5x the inter-quartile range (IQR), which is the default for matplotlib.

Nodes at distance zero are the topic nodes.  We clearly see clusters tend to contain nodes at a distance of two and three relationships from the topics, which is to be expected as the number of nodes increases exponentially as we get further from the topic.  What is interesting though are the tails of the distributions. Clusters with a large number of topic nodes within them are likely small clusters as there is no way for a large cluster to be predominantly topic nodes without depriving other clusters of topic nodes. We do however see a very long tail in clusters with a significant number of nodes one relationship from a topic node. These can be categorized as clusters with a topic node or two and all other nodes in the cluster explaining the topic. This is most likely whois data and few IPs for the topic domains (see figures 14 and 20 below), as IPs would also include BGP prefixes and ASN numbers that would implicitly be 2 and 3 relationships away from the topic. In these clusters predominantly of nodes one relationship from the topic, we can conclude they are not part of a larger infrastructure but a small structure around one or two domains.

Figure 19
image-6528

Figure 19

 

The tails of the two and three relationship topic distances are interesting as well. We can expect boxes to be large as it implies significant interrelated infrastructure. This presents itself in the graph in chains such as “name server X”<-[described by]-“applepay domain”-[described by]->”IP address”<-[described by]-“other domain”-[described by]->”name server X”. The IP address would also have a relationship to a BGP prefix, which would have a relationship to an ASN. In that case the “other domain” and “BGP prefix” would be two relationships away and the ASN number, three. Also, anything associated with the “other domain” will be at least three relationships away.  Due to the large boxes though, the 1.5x IQR goes all of the way to 100%.  We can remove the whiskers and analyze the outliers in clusters above the box in Figure 19.  We see multiple clisters with 60% and 80% composition of nodes two or three relationships from a topic. Effectively this means infrastructure that is more highly clustered with itself than the topic domains, as nodes have to have some relationship to topics to even be in the graph.  These are likely the clusters with no topics in them that we visually identified earlier.  As these clusters are of interest, we will manually analyze them in our next blog. 

Cluster Malice Analysis

As part of Verizon’s threat analysis, Verizon uses multiple intelligence feeds to enrich its data. As many people in the information security community have pointed out, intelligence feeds in the sense used in information security are little more than classification of IP addresses and domains. However, we can apply that classification to our data to help add context to the clusters of infrastructure we have identified. We will apply scoring algorithms, (which we discuss in the companion blog post), to propagate the malice score from the malice node to the rest of the subgraph to assess the malice of nodes within the graph. The outcome of this analysis is aggregated per cluster in the above data frame as column “aggMaliceScore’.  We also capture the percentage of nodes in each cluster at each distance from the malice node similar to our topic distance percentages above.

 

Figure 9
image-6529

Figure 9

We will start by analyzing the boxplots in Figure 9.  Immediately it sticks out that third quartile for two relationships from the malice node is 40%. This gives us a good idea of what a normal cluster looks like. The whisker however hides clusters above 40% due to the large IQR. We can remove the whiskers as we did earlier and see the actual points in Figure 10.

Figure 10
image-6530

Figure 10

By using

we see that the 95th percentile is at roughly 80%. Between Figures 8 and 9, we see multiple clusters of interest. Those that are greater than 80% comprised of nodes of distance two from the malice topic are a good focus for investigation.   Also worth investigation are those clusters with a significant number of nodes with a direct relationship to the malice node, (i.e. malice distance of one). We see five nodes above the whisker in this category that we will add to our list for manual analysis.

Figure 11
image-6531

FIgure 11

What is not captured In Figures 9 and 10 , however, is the aggregate score. The aggregate malice score is a score of the malice accumulated through all of the nodes within the cluster. Figure 11 provides that view.

While the data is normalized in the boxplot, it is not a percentage. In this plot, we see a long tale with a single cluster with an aggregate score of nearly twice the next highest cluster. This is likely a small cluster with multiple nodes related to malice.  We will add all of the clusters above the 1.5x IQR for manual investigation as potentially malicious infrastructure.

Figure 17
image-6532

Figure 17

It may also be helpful to identify nodes of significantly high malice and their associated clusters. This may identify clusters with a very malicious node in with a subset of the cluster which is not malicious thereby decreasing the aggregate malice score of the cluster.  Figure 17 shows the boxplot of normalized individual node malice scores. Using the same centrality and scale estimation algorithms as above, we choose a cutoff of about .23 which leaves 1734 nodes in 81 clusters.

Next Post

In the next post, we will take the clusters identified through statistical analysis and manually analyze them to determine if our statistical process has appropriately identified previously unknown malice, providing true threat intelligence.  Stay tuned!