Microsoft’s cloud-based security information and event management software, Sentinel, is built on top of Azure’s data management tooling, including Azure Monitor and its built-in Log Analytics. One key part of this suite is Azure Data Explorer, a tool used to explore and analyze data with queries across multiple stores, mixing structured and unstructured data in a data lake.
SEE: Learn Microsoft Azure online (TechRepublic Academy)
At the heart of Data Explorer is a query language called Kusto, usually known as KQL, that’s designed to help find patterns in data. Unlike SQL, KQL is only intended to read data. That’s an important feature for a SIEM tool like Sentinel, where users need to work non-destructively, treating log data purely as a source of information.
It’s an approach much like that used in large-scale data warehouses, where queries are used to find data as quickly as possible, to help make critical business decisions.
Kusto query language speeds incident response times
Kusto’s support for working with large amounts of data is key to its use in Sentinel, as logs and other security data rapidly become large repositories. The actions of a bad actor or malware are the proverbial needle in the haystack of logs, so a tool that’s optimized for this type of query is essential.
Relying on Sentinel’s automated tooling may leave you at a disadvantage, with a reactive rather than proactive posture. Being able to conduct your own investigations is key to keeping on top of your security posture as well as for providing inputs that help train future SIEM rulesets.
Enterprise security has become a big data problem, so using big data tooling to help generate insights makes sense, especially when you’re regularly exploring specific log files or combinations of logs. As KQL is designed for this sort of task and used by some of the analytical tools that come together in Sentinel, it’s the obvious choice. As well as being used for ad hoc queries, KQL can be embedded in runbooks to help automate responses and custom analysis.
Building KQL queries
KQL is an interesting hybrid of scripting and query tools, so it’s familiar to anyone who’s used Python for data science or SQL for working with databases. It’s designed to work against tables of data, with the ability to create variables and constants that can help control the flow of a set of KQL statements.
A good way to think of a KQL query is as a pipeline: It involves first getting data, then filtering it, before summarizing and sorting, and finally selecting the results you need. There’s some similarity to the structure of a PowerShell command, with a more explicit requirement for ordering statements, as changing the order of filters and summaries can significantly affect the output.
Getting the order of filters right is the key to building effective KQL
The pipeline used to execute KQL connects filters in series, so you want to make sure you filter data at the start of a query, minimizing the amount of data passed to subsequent stages. Microsoft has detailed best practices on KQL filter usage that can help as you work with your Sentinel data, including:
- Using time filters first.
- Avoiding searching for substrings.
- Only using specific columns for text searches.
This means it’s crucial to understand both your data sources and the results you are looking for before you build any Kusto query.
KQL’s pipeline model makes building and designing queries relatively easy
While KQL is easy to work with, you won’t get good results if you don’t understand the structure of your data. First, you need to know the names of all of the tables used in Sentinel’s workspace. These are needed to specify where you’re getting data from, with modifiers to take only a set number of rows and to limit how much data is returned.
This data then needs to be sorted, with the option of taking only the latest results. Next, the data can be filtered, so for example, you’re only getting data from a specific IP range or for a set time period.
Once data has been selected and filtered, it’s summarized. This creates a new table with only the data you’ve filtered and only in the columns you’ve chosen. Columns can be renamed as needed and can even be the product of KQL functions — for example summing data or using the maximum and minimum values for the data.
The available functions include basic statistical operations, so you can use your queries to look for significant data — a useful tool when hunting suspected intrusions through gigabytes of logs. More complex operations can be carried out using the evaluate operator, which uses plug-ins to handle data science tasks.
SEE: Hiring kit: Data scientist (TechRepublic Premium)
While most KQL operations are carried out across a single log table, you can use union or join statements to work with multiple tables at the same time. This lets you start to correlate data across logs, where the signals of an attack might be more obvious.
Learn KQL by using KQL in Sentinel
The result is a powerful set of tools that, with some experience, shape up into a way of repeatedly exploring log files, looking for signs of attacks and attackers. Queries can be embedded in Sentinel workbooks, so they can be shared across everyone in your security operations center.
Usefully there are training workbooks built into Sentinel that can be used to speed up learning the language and that offer examples of how KQL can be used in different use cases.
If you want to experiment before getting started, you don’t need to have Sentinel installed, as Microsoft has a demo Azure Log Analytics environment in the Azure portal that can be used to experiment with KQL query design. It’s free to use, too; all you need is an Azure log-in.
SIEM tools like Sentinel simplify getting and acting on data from log files. But, machine learning is still no match for the eyes of an experienced security analyst, especially when it comes to new attacks and the subtle signals of advanced persistent threats.
That’s why including KQL as part of Sentinel makes a lot of sense, as it mixes advanced analytics tools with the simplicity of a scripting language. When combined with tools like Power BI, the result is a way to quickly analyze and visualize gigabytes of log data, finding the information needed to keep your network secure.