diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..3afd1b0 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,46 @@ +# SC4S Architectural Considerations + +There are some key architectural considerations and recommendations that will yield extremely performant and reliable syslog +data collection while minimizing the "over-engineering" that is common in many syslog data collection designs. These +recommendatations are not specific to Splunk Connect for Syslog, but rather stem from the syslog protocol itself -- and its age. + +## The syslog Protocol + +The syslog protocol was designed in the mid 1980s to offer very high-speed, network-based logging for network and security devices that +were (especially at the time) starved for CPU and I/O resources. For this reason, the protocol was designed for speed and efficiency at the +expense of resiliencey/reliability. UDP was chosen due to its ability to "send and forget" the events over the network without regard +(or acknowledgment) of receipt. In later years, TCP was added as a transport, as well as TLS/SSL. In spite of these additions, UDP still +retains favor as a syslog transport for most data centers, and for the same reasons as originally designed. + +Becuase of these tradeoffs selected by the original designers (and retained to this day), traditional methods used to provide scale and +resiliency do not necessarily transfer to the syslog world. We will discuss (and reference) some of the salient points below. + +## Collector Location + +Due to syslog being a "send and forget" protcol, it does not perform well when routed through substantial (and especially WAN) network infrastructure. +This _includes_ front-side load balancers. The most reliable way to collect syslog traffic is to provide for _edge_ +(not centralized) collection. Resist the urge to centrally locate any syslog server (sc4s included) and expect the UDP and (stateless) +TCP traffic to "make it". Data loss will undoubtedly occur. + +## syslog Data Collection at Scale + +In concert with attempts to centralize syslog, many admins will co-locate several syslog-ng servers for horizontal scale, and load balance +to them with a front-side load balancer. For many reasons (that go beyond this short discussion) this is not a best practice. Briefly: + +* The attempt to load balance for scale (and HA -- see below) will actually cause _more_ data loss due to normal device operations and +and attendant buffer loss than would be the case if a simple, robust single server (or shared-IP cluster) were used. + +* Front-side load balancing will also cause inadequate data distribution on the upstream side, leading to data unevenness on the indexers. + +## HA Considerations and Challenges + +In addtion to scale, many opt to load balance for high availabilty. While a sound approach for stateful, application-level protocols such +as http, it does not work well for stateless, unacknowldged syslog traffic. Again, in the attempt to design for HA, more data ends up +being lost vs. more simple designs such as vMotioned VMs. With syslog, always remember that the protocol _itself_ is lossy, and there +_will_ be data loss (think CD-quality (lossless) vs. MP3). Syslog data collection can be made, at best, "Mostly Available". + +## UDP vs. TCP + +Paradoxically, UDP for syslog actually ends up being a better choice for resliency for syslog. For an excellent discussion on this topic +(as well as the "myth" of load balancers for HA), +see [Performant AND Reliable Syslog: UDP is best](https://www.rfaircloth.com/2020/05/21/performant-and-reliable-syslog-udp-is-best/). diff --git a/docs/configuration.md b/docs/configuration.md index 2f92b78..4029d54 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -35,34 +35,34 @@ for the alternate HEC destination `d_hec_FOO` to 24, set `SC4S_DEST_SPLUNK_HEC_F ## Creation of Additional Splunk HEC Destinations -Additional Splunk HEC destinations can be dynamically created through environment variables. The use of these destinations can then be controlled -along with other user-defined destinations on a global or per-source basis (see "Alternate Destination Use" immediately below). +Additional Splunk HEC destinations can be dynamically created through environment variables. When set, the destinations will be +created with the `DESTID` appended, for example: `d_hec_FOO`. These destinations can then be specified for use (along with any other +destinations created locally) either globally or per source. See the "Alternate Destination Use" in the next section for details. | Variable | Values | Description | |----------|---------------|-------------| -| SPLUNK_HEC_ALT_DESTS | Comma or space-separated UPPER case list of destination ids | destination IDs are UPPER case single-word friendly strings used to identify the new destination, which will be named with the destination id appended, for example `d_hec_FOO` | -| SPLUNK_HEC<DESTID>_URL | url | Example: `SPLUNK_HEC_FOO_URL=https://splunk:8088`. `DESTID` must be a member of the list configured in `SPLUNK_HEC_ALT_DESTS` configured above | -| SPLUNK_HEC<DESTID>_TOKEN | string | Example: `SPLUNK_HEC_BAR_TOKEN=<token>`. `DESTID` must be a member of the list configured in `SPLUNK_HEC_ALT_DESTS` configured above | +| SPLUNK_HEC_ALT_DESTS | Comma or space-separated UPPER case list of destination IDs | Destination IDs are UPPER case, single-word friendly strings used to identify the new destinations which will be named with the `DESTID` appended, for example `d_hec_FOO` | +| SPLUNK_HEC_<DESTID>_URL | url | Example: `SPLUNK_HEC_FOO_URL=https://splunk:8088` `DESTID` must be a member of the list specified in `SPLUNK_HEC_ALT_DESTS` configured above | +| SPLUNK_HEC_<DESTID>_TOKEN | string | Example: `SPLUNK_HEC_BAR_TOKEN=` `DESTID` must be a member of the list specified in `SPLUNK_HEC_ALT_DESTS` configured above | -When set above, the destinations will be created with the `DESTID` appended, for example: `d_hec_FOO`. These destinations can then be -specified below (along with any other destinations created locally) either globally or per source. - -* NOTE: The `DESTID` specified in the `URL` and `TOKEN` variables above _must_ match the `DESTID` entries enumerated the `SPLUNK_HEC_ALT_DESTS` list. -Failure to do so will cause destinations to be created without proper HEC parameters. +* NOTE: The `DESTID` specified in the `URL` and `TOKEN` variables above _must_ match the `DESTID` entries enumerated in the +`SPLUNK_HEC_ALT_DESTS` list. For each `DESTID` value specified in `SPLUNK_HEC_ALT_DESTS` there must be a corresponding `URL` and `TOKEN` +variable set as well. Failure to do so will cause destinations to be created without proper HEC parameters which will result in connection +failure. * NOTE: Additional Splunk HEC destinations will _not_ be tested at startup. It is the responsiblity of the admin to ensure that additional destinations are provisioned with the correct URL(s) and tokens to ensure proper connectivity. * NOTE: The disk and CPU requirements will increase proportionally depending on the number of additional HEC destinations in use (e.g. each HEC -destination will have its own disk buffer). +destination will have its own disk buffer by default). ## Alternate Destination Use -All alternate destinations (including alternate HEC destinations) are configured for use in SC4S through variables. Global and/or source-specific forms of -the variables below can be used to send data to alternate destinations. +All alternate destinations (including alternate HEC destinations) are configured for use in SC4S through the variables below. Global and/or +source-specific forms of the variables below can be used to send data to additional and/or alternate destinations. * NOTE: The administrator is responsible for ensuring that any non-HEC alternate destinations are configured in the -local mount tree, and that syslog-ng properly parses them. +local mount tree, and that the underlying syslog-ng process in sc4s properly parses them. * NOTE: Do not include the primary HEC destination (`d_hec`) in any list of alternate destinations. The configuration of the primary HEC destination is configured separately from that of the alternates below. However, _alternate_ HEC destinations (e.g. `d_hec_FOO`) should be configured below, just @@ -192,10 +192,9 @@ Here is a snippet from the `splunk_metadata.csv` file: juniper_netscreen,index,ns_index ``` -The columns in this file are `key`, `metadata`, and `value`. By default, the keys in this file are "commented out", but in reality CSV files -cannot have comments so the `#` simply causes a mismatch to the key reference, effectively "commenting" it out. Therefore, to ensure there -is a match from the log path that references this file, be sure to remove the leading `#`. Once this is done, the following changes can be -made by modifying and/or adding rows in the table and specifying one or more of the following `metadata`/`value` pairs for a given `key`: +The columns in this file are `key`, `metadata`, and `value`. Defaults are populated into this file at initial startup, and any changes +made will be preserved on subsequent startups. Changes can be made by modifying and/or adding rows in the table and specifying one or more +of the following `metadata`/`value` pairs for a given `key`: * `index` to specify an alternate `value` for index * `source` to specify an alternate `value` for source @@ -206,15 +205,14 @@ made by modifying and/or adding rows in the table and specifying one or more of indexed by Splunk. Changing this carries the same warning as the sourcetype above; this will affect the upstream TA. The template choices are documented elsewhere in this "Configuration" section. -In this case, the `juniper_netscreen` key is "uncommented" (thereby enabling it), and the new index used for that data source will be -`ns_index`. +In this case, the `juniper_netscreen` key references a new index used for that data source called `ns_index`. In general, for most deployments the index should be the only change needed; other default metadata should almost never be overridden (particularly for the "Out of the Box" data sources). Even then, care should be taken when considering any alternates, as the defaults for SC4S were chosen with best practices in mind. -The `splunk_metadata.csv` file should also be appended to (with a "commented out" default for the index) when building custom SC4S log paths -(filters). Care should be taken during filter design to choose appropriate index, sourctype and template defaults, so that admins are not +The `splunk_metadata.csv` file should also be appended to with an appropriate default for the index when building a custom SC4S log path +(filter). Care should be taken during filter design to choose appropriate index, sourctype and template defaults, so that admins are not compelled to override them. diff --git a/docs/gettingstarted/byoe-rhel7.md b/docs/gettingstarted/byoe-rhel7.md index 47f6493..8c98415 100644 --- a/docs/gettingstarted/byoe-rhel7.md +++ b/docs/gettingstarted/byoe-rhel7.md @@ -47,12 +47,13 @@ sudo yum install ./epel-release-latest-*.noarch.rpm -y sudo subscription-manager repos --enable rhel-7-server-optional-rpms ``` -* Enable the "stable" unofficial repo for syslog-ng and install required packages +* Enable the "stable" unofficial repo for syslog-ng and install required packages. The last package, `syslog-ng-afsnmp`, is only required +when using the optional snmp trap collection facility (disabled by default). ```bash cd /etc/yum.repos.d/ sudo wget https://copr.fedorainfracloud.org/coprs/czanik/syslog-ng-stable/repo/epel-7/czanik-syslog-ng-stable-epel-7.repo -sudo yum install syslog-ng syslog-ng-http syslog-ng-python +sudo yum install syslog-ng syslog-ng-http syslog-ng-python syslog-ng-afsnmp ``` * Optional step: Disable the distro-supplied syslog-ng unit file, as the syslog-ng process configured here will run as the `sc4s` @@ -64,14 +65,17 @@ sudo systemctl stop syslog-ng sudo systemctl disable syslog-ng ``` -* Download the latest bare_metal.tar from [releases](https://github.com/splunk/splunk-connect-for-syslog/releases) on github and untar the package in `/etc/syslog-ng` +* Download the latest bare_metal.tar from [releases](https://github.com/splunk/splunk-connect-for-syslog/releases) on github and untar the package in `/etc/syslog-ng` using the command example below. * NOTE: The `wget` process below will unpack a tarball with the sc4s version of the syslog-ng config files in the standard `/etc/syslog-ng` location, and _will_ overwrite existing content. Ensure that any previous configurations of syslog-ng are saved if needed prior to executing the download step. +* NOTE: At the time of writing, the latest release is `v1.24.0`. The latest release is typically listed first on the page above, unless +there is an `-alpha`,`-beta`, or `-rc` release that is newer (which will be clearly indicated). For production use, select the latest that does not have an `-rc`, `-alpha`, or `-beta` suffix. + ```bash -sudo wget -c https://github.com/splunk/splunk-connect-for-syslog/releases/download/latest/baremetal.tar -O - | sudo tar -x -C /etc/syslog-ng +sudo wget -c https://github.com/splunk/splunk-connect-for-syslog/releases/download//baremetal.tar -O - | sudo tar -x -C /etc/syslog-ng ``` * Install gomplate and confirm that the version is 3.5.0 or newer @@ -82,10 +86,6 @@ sudo chmod 755 /usr/local/bin/gomplate gomplate --version ``` -* Install the latest python - -```scl enable rh-python36 bash``` - * create the sc4s unit file ``/lib/systemd/system/sc4s.service`` and add the following content ```ini @@ -122,7 +122,6 @@ Add the following content (but be sure to check the note above to ensure the lat ```bash #!/usr/bin/env bash -source scl_source enable rh-python36 cd /etc/syslog-ng #The following is no longer needed but retained as a comment just in case we run into command line length issues @@ -136,7 +135,8 @@ cd /etc/syslog-ng # --output-map="$d/{{ .in | strings.ReplaceAll \".conf.tmpl\" \".conf\" }}" #done -gomplate $(find . -name *.tmpl | sed -E 's/^(\/.*\/)*(.*)\..*$/--file=\2.tmpl --out=\2/') --template t=go_templates/ +# Ensure gomplate is in the shell path or provide the full pathname to the executable +/usr/local/bin/gomplate $(find . -name "*.tmpl" | sed -E 's/^(\/.*\/)*(.*)\..*$/--file=\2.tmpl --out=\2/') --template t=go_templates/ mkdir -p /etc/syslog-ng/conf.d/local/context/ mkdir -p /etc/syslog-ng/conf.d/local/config/ @@ -145,9 +145,9 @@ for file in /etc/syslog-ng/conf.d/local/context/*.example ; do cp -v -n $file ${ cp -v -R /etc/syslog-ng/local_config/* /etc/syslog-ng/conf.d/local/config/ ``` -* (Optional) Execute the preconfiguration shell script created above. You may also optionally execute it as part of the unit -file, which is recommended. If you elect _not_ to execute the script in the unit file, care must be taken to execute it manually "out of band" -when any changes are made. +* Execute the preconfiguration shell script created above prior to starting sc4s. You may also optionally execute it as part of a systemd unit +file (as shown above), which is recommended. If you elect _not_ to execute the script as part of systemd, care must be taken to execute it +manually "out of band" when any changes are made. ```bash sudo bash /opt/sc4s/bin/preconfig.sh @@ -187,12 +187,4 @@ the data. In other cases, a unique listening port is required for certain devic For collection of such sources we provide a means of dedicating a unique listening port to a specific source. Refer to the "Sources" documentation to identify the specific environment variables used to enable unique listening ports for the technology -in use. - -## Unique Ports for Device "Families" - -Certain technology "families", such as CEF and Fortinet, are handled by a single log path in SC4S. To set unique ports for individual -devices in a family (e.g. one each for Fortiweb and FortiOS), the container version of SC4S uses "container networking" (detailed -in the source document for the respective device families). This, of course, is not avaialble in BYOE. For this reason, the syslog-ng source -configuration for the extra ports that need to be mapped will need to be added manually to either the template or final "conf" version of the -respective log path file. \ No newline at end of file +in use. \ No newline at end of file diff --git a/docs/gettingstarted/docker-swarm-general.md b/docs/gettingstarted/docker-swarm-general.md index 7c97c52..6bd6f59 100644 --- a/docs/gettingstarted/docker-swarm-general.md +++ b/docs/gettingstarted/docker-swarm-general.md @@ -114,31 +114,13 @@ SPLUNK_HEC_TOKEN=a778f63a-5dff-4e3c-a72c-a03183659e94 Acknowledgement when deploying the HEC token on the Splunk side; the underlying syslog-ng http destination does not support this feature. Moreover, HEC Ack would significantly degrade performance for streaming data such as syslog. -* Set `SC4S_DEST_SPLUNK_HEC_WORKERS` to match the number of indexers and/or HWFs with HEC endpoints, up to a maxiumum of 32. -If the endpoint is a VIP, match this value to the total number of indexers behind the load balancer. +* The default number of `SC4S_DEST_SPLUNK_HEC_WORKERS` is 10. Consult the community if you feel the number of workers (threads) should +deviate from this. * NOTE: Splunk Connect for Syslog defaults to secure configurations. If you are not using trusted SSL certificates, be sure to uncomment the last line in the example above. -## Configure SC4S Default Listening Ports - -Most enterprises use UDP/TCP port 514 as the default as their main listening port for syslog "soup" traffic, and TCP port 6514 for TLS. -The docker compose file and standard SC4S configurations reflect these defaults. If it desired to change some or all of them, container -port mapping can be used to change the defaults without altering the underlying SC4S configuration. To do this, simply change the -``published`` port(s) in the docker compose file (which represents the actual listening ports on the host machine), like so: - -``` - ports: - - target: 514 - published: 614 - protocol: tcp -#Comment the following line out if using docker-compose - mode: host -``` -This snippet above instructs the _host_ to listen on TCP port 614 and map that port to the default TCP 514 port on the _container_. -No changes to the underlying SC4S default configuration (environment variables) are needed. - -### Dedicated (Unique) Listening Ports +## Dedicated (Unique) Listening Ports For certain source technologies, categorization by message content is impossible due to the lack of a unique "fingerprint" in the data. In other cases, a unique listening port is required for certain devices due to network requirements in the enterprise. @@ -173,9 +155,10 @@ can be ammended with additional ``target`` stanzas in the ``ports`` section of t Log paths are preconfigured to utilize a convention of index destinations that are suitable for most customers. * If changes need to be made to index destinations, navigate to the ``/opt/sc4s/local/context`` directory to start. -* Edit `splunk_metadata.csv` to review or change the index configuration and revise as required for the data sources utilized in your -environment. Simply uncomment the relevant line and enter the desired index. The "Sources" document details the specific entries in -this table that pertain to the individual data source filters that are included with SC4S. +* Edit `splunk_metadata.csv` to review or change the index configuration as required for the data sources utilized in your +environment. The key (1st column) in this file uses the syntax `vendor_product`. Simply replace the index value (the 3rd column) in the +desired row with the index appropriate for your Splunk installation. The "Sources" document details the specific `vendor_product` keys (rows) +in this table that pertain to the individual data source filters that are included with SC4S. * Other Splunk metadata (e.g. source and sourcetype) can be overriden via this file as well. This is an advanced topic, and further information is covered in the "Log Path overrides" section of the Configuration document. @@ -223,7 +206,7 @@ execute the following search in Splunk: ```ini index=* sourcetype=sc4s:events "starting up" ``` -This should yield the following event: +This should yield an event similar to the following: ```ini syslog-ng starting up; version='3.28.1' ``` @@ -246,6 +229,4 @@ syslog-ng checking config sc4s version=v1.24.0 syslog-ng starting ``` -If you see http server errors such as 4xx or 5xx responses from the http (HEC) endpoint, one or more of the items above are likely set -incorrectly. If validating/fixing the configuration fails to correct the problem, proceed to the "Troubleshooting" section for more -information. +If you do not see the output above, proceed to the "Troubleshooting" section for more detailed information. diff --git a/docs/gettingstarted/docker-swarm-rhel7.md b/docs/gettingstarted/docker-swarm-rhel7.md index e10d7ce..b53ba1e 100644 --- a/docs/gettingstarted/docker-swarm-rhel7.md +++ b/docs/gettingstarted/docker-swarm-rhel7.md @@ -122,31 +122,13 @@ SPLUNK_HEC_TOKEN=a778f63a-5dff-4e3c-a72c-a03183659e94 Acknowledgement when deploying the HEC token on the Splunk side; the underlying syslog-ng http destination does not support this feature. Moreover, HEC Ack would significantly degrade performance for streaming data such as syslog. -* Set `SC4S_DEST_SPLUNK_HEC_WORKERS` to match the number of indexers and/or HWFs with HEC endpoints, up to a maxiumum of 32. -If the endpoint is a VIP, match this value to the total number of indexers behind the load balancer. +* The default number of `SC4S_DEST_SPLUNK_HEC_WORKERS` is 10. Consult the community if you feel the number of workers (threads) should +deviate from this. * NOTE: Splunk Connect for Syslog defaults to secure configurations. If you are not using trusted SSL certificates, be sure to uncomment the last line in the example above. -## Configure SC4S Default Listening Ports - -Most enterprises use UDP/TCP port 514 as the default as their main listening port for syslog "soup" traffic, and TCP port 6514 for TLS. -The docker compose file and standard SC4S configurations reflect these defaults. If it desired to change some or all of them, container -port mapping can be used to change the defaults without altering the underlying SC4S configuration. To do this, simply change the -``published`` port(s) in the docker compose file (which represents the actual listening ports on the host machine), like so: - -``` - ports: - - target: 514 - published: 614 - protocol: tcp -#Comment the following line out if using docker-compose - mode: host -``` -This snippet above instructs the _host_ to listen on TCP port 614 and map that port to the default TCP 514 port on the _container_. -No changes to the underlying SC4S default configuration (environment variables) are needed. - -### Dedicated (Unique) Listening Ports +## Dedicated (Unique) Listening Ports For certain source technologies, categorization by message content is impossible due to the lack of a unique "fingerprint" in the data. In other cases, a unique listening port is required for certain devices due to network requirements in the enterprise. @@ -181,9 +163,10 @@ can be ammended with additional ``target`` stanzas in the ``ports`` section of t Log paths are preconfigured to utilize a convention of index destinations that are suitable for most customers. * If changes need to be made to index destinations, navigate to the ``/opt/sc4s/local/context`` directory to start. -* Edit `splunk_metadata.csv` to review or change the index configuration and revise as required for the data sources utilized in your -environment. Simply uncomment the relevant line and enter the desired index. The "Sources" document details the specific entries in -this table that pertain to the individual data source filters that are included with SC4S. +* Edit `splunk_metadata.csv` to review or change the index configuration as required for the data sources utilized in your +environment. The key (1st column) in this file uses the syntax `vendor_product`. Simply replace the index value (the 3rd column) in the +desired row with the index appropriate for your Splunk installation. The "Sources" document details the specific `vendor_product` keys (rows) +in this table that pertain to the individual data source filters that are included with SC4S. * Other Splunk metadata (e.g. source and sourcetype) can be overriden via this file as well. This is an advanced topic, and further information is covered in the "Log Path overrides" section of the Configuration document. @@ -231,7 +214,7 @@ execute the following search in Splunk: ```ini index=* sourcetype=sc4s:events "starting up" ``` -This should yield the following event: +This should yield an event similar to the following: ```ini syslog-ng starting up; version='3.28.1' ``` @@ -254,6 +237,4 @@ syslog-ng checking config sc4s version=v1.24.0 syslog-ng starting ``` -If you see http server errors such as 4xx or 5xx responses from the http (HEC) endpoint, one or more of the items above are likely set -incorrectly. If validating/fixing the configuration fails to correct the problem, proceed to the "Troubleshooting" section for more -information. \ No newline at end of file +If you do not see the output above, proceed to the "Troubleshooting" section for more detailed information. diff --git a/docs/gettingstarted/docker-systemd-general.md b/docs/gettingstarted/docker-systemd-general.md index 61c8cd9..60ece81 100644 --- a/docs/gettingstarted/docker-systemd-general.md +++ b/docs/gettingstarted/docker-systemd-general.md @@ -118,8 +118,8 @@ SPLUNK_HEC_TOKEN=a778f63a-5dff-4e3c-a72c-a03183659e94 Acknowledgement when deploying the HEC token on the Splunk side; the underlying syslog-ng http destination does not support this feature. Moreover, HEC Ack would significantly degrade performance for streaming data such as syslog. -* Set `SC4S_DEST_SPLUNK_HEC_WORKERS` to match the number of indexers and/or HWFs with HEC endpoints, up to a maxiumum of 32. -If the endpoint is a VIP, match this value to the total number of indexers behind the load balancer. +* The default number of `SC4S_DEST_SPLUNK_HEC_WORKERS` is 10. Consult the community if you feel the number of workers (threads) should +deviate from this. * NOTE: Splunk Connect for Syslog defaults to secure configurations. If you are not using trusted SSL certificates, be sure to uncomment the last line in the example above. @@ -150,10 +150,10 @@ ExecStart=/usr/bin/docker run -p 514:514 -p 514:514/udp -p 6514:6514 -p 5000-502 Log paths are preconfigured to utilize a convention of index destinations that are suitable for most customers. * If changes need to be made to index destinations, navigate to the ``/opt/sc4s/local/context`` directory to start. -* Edit `splunk_metadata.csv` to review or change the index configuration and revise as required for the data sources utilized in your -environment. The key (1st column) in this file uses the syntax `vendor_product`. Simply replace the index value (the 3rd column) in the desired -row with the index appropriate for your Splunk installation. The "Sources" document details the specific keys (rows) in this table that pertain to the -individual data source filters that are included with SC4S. +* Edit `splunk_metadata.csv` to review or change the index configuration as required for the data sources utilized in your +environment. The key (1st column) in this file uses the syntax `vendor_product`. Simply replace the index value (the 3rd column) in the +desired row with the index appropriate for your Splunk installation. The "Sources" document details the specific `vendor_product` keys (rows) +in this table that pertain to the individual data source filters that are included with SC4S. * Other Splunk metadata (e.g. source and sourcetype) can be overriden via this file as well. This is an advanced topic, and further information is covered in the "Log Path overrides" section of the Configuration document. @@ -220,7 +220,7 @@ execute the following search in Splunk: ```ini index=* sourcetype=sc4s:events "starting up" ``` -This should yield the following event: +This should yield an event similar to the following: ```ini syslog-ng starting up; version='3.28.1' ``` @@ -244,4 +244,3 @@ sc4s version=v1.24.0 syslog-ng starting ``` If you do not see the output above, proceed to the "Troubleshooting" section for more detailed information. - diff --git a/docs/gettingstarted/index.md b/docs/gettingstarted/index.md index 36be6ed..ecc06ec 100644 --- a/docs/gettingstarted/index.md +++ b/docs/gettingstarted/index.md @@ -103,6 +103,21 @@ documentation regarding tuning syslog-ng in particular (via the [SC4S_SOURCE_UDP environment variable in sc4s) as well as overall host kernel tuning. The default values for receive kernel buffers in most distros is 2 MB, which has proven inadequate for many. +#### IPv4 Forwarding + +In many distributions (e.g. CentOS provisioned in AWS), IPV4 forwarding is _not_ enabled by default. +This needs to be enabled for container networking to function properly. The following is an example +to set this up; as usual this needs to be vetted with your enterprise security policy: + +```sudo sysctl net.ipv4.ip_forward=1``` + +To ensure the change survives a reboot edit /etc/sysctl.conf, find (or add) the text below, and uncomment as shown: + +``` +# Uncomment the next line to enable packet forwarding for IPv4 +net.ipv4.ip_forward=1 +``` + #### Select a Container Runtime and SC4S Configuration | Container Runtime and Orchestration | Operating Systems | diff --git a/docs/gettingstarted/podman-systemd-general.md b/docs/gettingstarted/podman-systemd-general.md index 347a053..b543f07 100644 --- a/docs/gettingstarted/podman-systemd-general.md +++ b/docs/gettingstarted/podman-systemd-general.md @@ -137,8 +137,8 @@ SPLUNK_HEC_TOKEN=a778f63a-5dff-4e3c-a72c-a03183659e94 Acknowledgement when deploying the HEC token on the Splunk side; the underlying syslog-ng http destination does not support this feature. Moreover, HEC Ack would significantly degrade performance for streaming data such as syslog. -* Set `SC4S_DEST_SPLUNK_HEC_WORKERS` to match the number of indexers and/or HWFs with HEC endpoints, up to a maxiumum of 32. -If the endpoint is a VIP, match this value to the total number of indexers behind the load balancer. +* The default number of `SC4S_DEST_SPLUNK_HEC_WORKERS` is 10. Consult the community if you feel the number of workers (threads) should +deviate from this. * NOTE: Splunk Connect for Syslog defaults to secure configurations. If you are not using trusted SSL certificates, be sure to uncomment the last line in the example above. @@ -169,10 +169,10 @@ ExecStart=/usr/bin/podman run -p 514:514 -p 514:514/udp -p 6514:6514 -p 5000-502 Log paths are preconfigured to utilize a convention of index destinations that are suitable for most customers. * If changes need to be made to index destinations, navigate to the ``/opt/sc4s/local/context`` directory to start. -* Edit `splunk_metadata.csv` to review or change the index configuration and revise as required for the data sources utilized in your -environment. The key (1st column) in this file uses the syntax `vendor_product`. Simply replace the index value (the 3rd column) in the desired -row with the index appropriate for your Splunk installation. The "Sources" document details the specific keys (rows) in this table that pertain to the -individual data source filters that are included with SC4S. +* Edit `splunk_metadata.csv` to review or change the index configuration as required for the data sources utilized in your +environment. The key (1st column) in this file uses the syntax `vendor_product`. Simply replace the index value (the 3rd column) in the +desired row with the index appropriate for your Splunk installation. The "Sources" document details the specific `vendor_product` keys (rows) +in this table that pertain to the individual data source filters that are included with SC4S. * Other Splunk metadata (e.g. source and sourcetype) can be overriden via this file as well. This is an advanced topic, and further information is covered in the "Log Path overrides" section of the Configuration document. @@ -239,7 +239,7 @@ execute the following search in Splunk: ```ini index=* sourcetype=sc4s:events "starting up" ``` -This should yield the following event: +This should yield an event similar to the following: ```ini syslog-ng starting up; version='3.28.1' ``` diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index b1c266a..bcc5682 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -1,26 +1,24 @@ #Troubleshooting -## General +## Startup -Prior to production deployment, it is easier to gauge proper operation outside of the systemd startup environment. systemctl/systemd -make it difficult to see the error output of problematic services, so rather than "fight it" there, it's best to confirm proper -operation directly on the CLI. - -To test the container outside of the systemd startup environment, you can run the following to test the syntax -of the container. These commands assume the local mounted directories are set up as shown in the gettingstarted -examples: +Most issues that occur with startup and operation of sc4s typically involve syntax errors or duplicate listening ports. If you are +running out of systemd, you may see this at startup: ```bash -/usr/bin/podman run -p 514:514 -p 514:514/udp -p 6514:6514 -p 5000-5020:5000-5020 -p 5000-5020:5000-5020/udp \ - --env-file=/opt/sc4s/env_file \ - -v splunk-sc4s-var:/opt/syslog-ng/var \ - -v /opt/sc4s/local:/opt/syslog-ng/etc/conf.d/local:z \ - -v /opt/sc4s/archive:/opt/syslog-ng/var/archive:z \ - --name SC4S_preflight \ - --rm splunk/scs:latest -s +[root@sc4s syslog-ng]# systemctl start sc4s +Job for sc4s.service failed because the control process exited with error code. See "systemctl status sc4s.service" and "journalctl -xe" for details. +``` +A better command than `journalctl -xe` is the following, +``` +journalctl -b -u sc4s | tail -100 ``` +which will print the last 100 lines of the system journal in far more detail, which should be sufficient to see the specific failure +(syntax or runtime) and guide you in troubleshooting. -and you can run +As an alternative to launching via systemd during the initial installation phase, you may wish to test the container startup outside of the +systemd startup environment. The following commmand will launch the container directly from the CLI. This command assumes the local mounted +directories are set up as shown in the "getting started" examples: ```bash /usr/bin/podman run -p 514:514 -p 514:514/udp -p 6514:6514 -p 5000-5020:5000-5020 -p 5000-5020:5000-5020/udp \ @@ -32,9 +30,27 @@ and you can run --rm splunk/scs:latest ``` -to test the final image. If you are using podman, substitute "podman" for "docker" for the container runtime command above. +If you are using docker, substitute "docker" for "podman" for the container runtime command above. -### Verification of TLS Server +### Stale Containers (podman) + +In rare instances, (especially when starting/stopping often) an SC4S container might not shut down completely when using podman, leaving a +"stale" container behind that is denoted by a very long ID string. You will see this type of output when viewing the journal after a failed +start caused by this condition, or a similar message when the container is run directly from the CLI: + +``` +Jul 15 18:45:20 sra-sc4s-alln01-02 podman[11187]: Error: error creating container storage: the container name "SC4S" is already in use by "894357502b2a7142d097ea3ca1468d1cb4fbc69959a9817a1bbe145a09d37fb9". You have to remove that container... +Jul 15 18:45:20 sra-sc4s-alln01-02 systemd[1]: sc4s.service: Main process exited, code=exited, status=125/n/a +``` + +To rectify this, simply execute +``` +podman rm -f 894357502b2a7142d097ea3ca1468d1cb4fbc69959a9817a1bbe145a09d37fb9 +``` + +replacing the long string with whatever container ID is shown in your error message. SC4S should then start normally. + +## Verification of TLS Server To verify the correct configuration of the TLS server use the following command. Use `podman` or `docker` and replace the IP, FQDN, and port as appropriate: @@ -45,24 +61,38 @@ and port as appropriate: ## Validating HEC/token issues (AKA "No data in Splunk") -The first thing to check are the container logs themselves, where stdout from the underlying syslog-ng is written by default. To do this, -run: +SC4S performs basic HEC connectivity and index checks at startup. These indicate general connection issues and indexes that may not be +accesible and/or configured on the Splunk side. To check the container logs which contain the results of these tests, run: ```bash -/usr/bin/podman logs SC4S +/usr/bin/ logs SC4S +``` + +and note the output. You will see entries similar to these: + +``` +SC4S_ENV_CHECK_HEC: Splunk HEC connection test successful; checking indexes... + +SC4S_ENV_CHECK_INDEX: Checking email {"text":"Incorrect index","code":7,"invalid-event-number":1} +SC4S_ENV_CHECK_INDEX: Checking epav {"text":"Incorrect index","code":7,"invalid-event-number":1} +SC4S_ENV_CHECK_INDEX: Checking main {"text":"Success","code":0} ``` -and note the output. You may see entries similar to these: +Note the specifics of the indexes that are not configured correctly, and rectify in the Splunk configuration. If this is not addressed +properly, you may see output similar to the below when data flows into sc4s: + + ``` Mar 16 19:00:06 b817af4e89da syslog-ng[1]: Server returned with a 4XX (client errors) status code, which means we are not authorized or the URL is not found.; url='https://splunk-instance.com:8088/services/collector/event', status_code='400', driver='d_hec#0', location='/opt/syslog-ng/etc/conf.d/destinations/splunk_hec.conf:2:5' Mar 16 19:00:06 b817af4e89da syslog-ng[1]: Server disconnected while preparing messages for sending, trying again; driver='d_hec#0', location='/opt/syslog-ng/etc/conf.d/destinations/splunk_hec.conf:2:5', worker_index='4', time_reopen='10', batch_size='1000' ``` + This is an indication that the standard `d_hec` destination in syslog-ng (which is the route to Splunk) is being rejected by the HEC endpoint. -A `400` error (not 404) is normally caused by an index that has not been created on the Splunk side, and is a common occurrence in new -installations. This can present a serious problem, as just _one_ bad index will "taint" the entire batch (in this case, 1000 events) and -prevent _any_ of them from being sent to Splunk. _It is imperative that the container logs be free of these kinds of errors in production._ +A `400` error (not 404) is normally caused by an index that has not been created on the Splunk side. This can present a serious problem, as +just _one_ bad index will "taint" the entire batch (in this case, 1000 events) and prevent _any_ of them from being sent to Splunk. _It is +imperative that the container logs be free of these kinds of errors in production._ -### Enabling the Alternate Debug Destination +## Enabling the Alternate Debug Destination To help debug why the `400` errors are ocurring, it is helpful to enable an alternate destination for syslog traffic that will write the contents of the full JSON payload that is intended to be sent to Splunk via HEC. This destination will contain each event, repackaged @@ -81,7 +111,37 @@ curl -k -u "sc4s HEC debug:a778f63a-5dff-4e3c-a72c-a03183659e94" "https://splunk command line to determine what, exactly, the HEC endpoint is returning. This can be used to refine th index or other parameter to correct the problem. -## "Exec" into the container +## Obtaining "On-the-wire" Raw Events + +In almost all cases during development or troubleshooting, you will need to obtain samples of the messages exactly as they are received by +SC4S. These "raw" events contain the full syslog message (including the `` preamble) and differs from those that appear in Splunk after +processing by sc4s and/or Splunk. This is the only way to determine if SC4S parsers and filters are operating correctly, as raw messages are +needed for "playback" when testing. In addition, the community supporting SC4S will always first ask for raw samples (kind of like the way +Splunk support always asks for "diags") before any development or troubleshooting exercise. + +Here are some options for obtaining raw logs for one or more sourcetypes: + +* Run `tcpdump` on the collection interface and display the results in ASCII. You will see events of the form +``` +<165>1 2007-02-15T09:17:15.719Z router1 mgd 3046 UI_DBASE_LOGOUT_EVENT [junos@2636.1.1.1.2.18 username="user"] User 'user' exiting configuration mode +``` +buried in the packet contents. + +* Set the variable `SC4S_SOURCE_STORE_RAWMSG=yes` in `env_file` and restart sc4s. This will store the raw message in a syslog-ng macro called +`RAWMSG` and will be displayed in Splunk for all `fallback` messages. For most other sourcetypes, the `RAWMSG` is _not_ displayed, but can be +surfaced by changing the output template to one of the JSON variants (t_JSON_3164 or t_JSON_5424 depending on RFC message type). See +[SC4S metadata configuration](https://splunk-connect-for-syslog.readthedocs.io/en/develop/configuration/#sc4s-metadata-configuration) for +more details. + +** IMPORTANT! Be sure to turn off the `RAWMSG` variable when you are finished, as it doubles the memory and disk requirements of sc4s. Do not +use in production! + +* Lastly, you can enable the alternate destination `d_rawmsg` for one or more sourcetypes. This destination will write the raw messages to the +container directory `/opt/syslog-ng/var/archive/rawmsg/` (which is typically mapped locally to `/opt/sc4s/archive`). +Within this directory, the logs are organized by host and time. This method can be useful when raw samples are needed for events that +partially parse (or parse into the wrong sourcetype) and the output template is not JSON (see above). + +## "exec" into the container (advanced) You can confirm how the templating process created the actual syslog-ng config files that are in use by "exec'ing in" to the container and navigating the syslog-ng config filesystem directly. To do this, run @@ -98,18 +158,27 @@ When debugging a configuration syntax issue at startup the container must remain ## Dealing with non RFC-5424 compliant sources -If a data source you are trying to ingest via SC4S claims it is RFC-5424 compliant however you are getting a log message processing error this might be happening. - -Unfortunately multiple vendors claim RFC-5424 compliance without fully testing that they are. The SC4S error message uses >@< to indicate where the error occurred. Here is an example error message… +If a data source you are trying to ingest claims it is RFC-5424 compliant but you are getting an "Error processing log message:" from SC4S, +the message violates the standard in some way. Unfortunately multiple vendors claim RFC-5424 compliance without fully testing that they are. +In this case, the underlying syslog-ng process will send an error event, with the location of the error in the original event highlighted with +`>@<` to indicate where the error occurred. Here is an example error message: +``` { [-] ISODATE: 2020-05-04T21:21:59.001+00:00 MESSAGE: Error processing log message: <14>1 2020-05-04T21:21:58.117351+00:00 arcata-pks-cluster-1 pod.log/cf-workloads/logspinner-testing-6446b8ef - - [kubernetes@47450 cloudfoundry.org/process_type="web" cloudfoundry.org/rootfs-version="v75.0.0" cloudfoundry.org/version="eae53cc3-148d-4395-985c-8fef0606b9e3" controller-revision-hash="logspinner-testing-6446b8ef05-7db777754c" cloudfoundry.org/app_guid="f71634fe-34a4-4f89-adac-3e523f61a401" cloudfoundry.org/source_type="APP" security.istio.io/tlsMode="istio" statefulset.kubernetes.io/pod-n>@ PROGRAM: syslog-ng -} - -In this example the error can be found in, statefulset.kubernetes.io/pod-n>@@