Scopr Change History ==================== XBow SDK -------- 6.0.835 Added de-obfuscation of powershell.exe command line attack. * Thus the infected .bat file reports Scopr:AntiMalware:Malware=ObfuscatedPowerShell and extracts the obfuscated URL. Added extraction of the MSIP_Labels MIME header (Microsoft Information Protection SDK Metadata). * The labels are added to the MIME object's details. * If the parent is XBOW_TYPE_MIME and the parent has an MSIP_Labels header (that is, MSIP_Label_xxx_Enabled=true; in its GDPRDetails) and the document does not have an MSIP_Label_xxx_Enabled=true; in its GDPRDetails, add "nomsiplabel;" to the object's details. Fixed setting of qwUncompressedSize for extracted binarized images. Fixed crash on de-allocated subject CkString buffer during recursive MIME part processing. * Now subject string is copied to a stack buffer instead. Added per-engine XAPI option StripComments. * If set on a text engine that supports it, produces child objects where comments have been stripped. Added XAPI configuration options BarcodeShrinkPercent, MinBarcodeShrinkFrameX, and MinBarcodeShrinkFrameY. * If the original image width >= MinBarcodeShrinkFrameX or height >= MinBarcodeShrinkFrameY, shrink/reduce the size of the image by BarcodeShrinkPercent (1-99; all other values do nothing). Added XEPS engine to convert Encapsulated Postscript (EPS) vector drawings to PNG. Added XJXR engine to convert JPEG XR (eXtended Range) images to BMP. * JXR is natively supported by Windows 11. Added XBOW_FLAG2_NO_DOCUMENT_TEXT (JSON "NoDocumentText") flag. * Set on PDF, OLE2, and OneNote documents that do not have any body text. Added Alpine Linux x64 build of scoprd. * Includes Chilkat 10.1.3 MIME parser. Added extraction of SegWit and Taproot Bitcoin addresses. Improved SVG file type identification. Added EPS, JXR, BPG, APNG, HEIF, CRAMFS, EXT, NTFS, SQUASHFS, HDF5, terminfo, and scr_dump file type identification. Added check of extraction limits even after base64 extraction fails. * It is very normal to find a long series of many base64-looking strings that we do not want to process due to configuration settings, but we still need to check extraction limits - particularly time. * But we do not want to count the string as an extracted object since we did not do that. Improved BIFF extraction. Improved UDF file type identification. Added MaxBarcodeZoomFrameX. Maximum image width for zoomed barcode processing. Added MaxBarcodeZoomFrameY. Maximum image height for zoomed barcode processing. Added MinBarcodeZoomFrameX. Minimum image width for zoomed barcode processing. Added MinBarcodeZoomFrameY. Minimum image height for zoomed barcode processing. Added stability/security enhancements to OLE2 engine. * Prevents arbitrarily large heap allocations due to malicious/malformed/corrupt OLE2 file data. Added logic to prevent extraction of raw data from SVG xlink:href="" references where the url starts with "data:". Upgraded to plutosvg 0.0.7. * Fixes an infinite loop that is causing minor problems in production. * Significantly better SVG support in general. Added logic to convert UTF-16 and UTF-32 XML and SVG documents to UTF-8 children before processing. Added JSON response fields “autodecryptable” (for encrypted files where auto-decrypt is supported) and “version” (the scoprd version that produced the JSON response). Added logic to generically extract 34-character bitcoin addresses that have been split in half and are separated by a colon within the 32 characters following the first half. * If the length of the string following the colon is exactly 17 (half of 34), that string is appended to the first 17 characters to produce a possible 34-character Bitcoin address. If the resulting address is valid, including its checksum, it is extracted. * Added “splitbtc=” to the object’s details. Improved metadata extraction by using an up to 5MB read buffer (based on the file size) instead of a constant 8K buffer. Added new XEVENT callback XBOW_EVENT_OBJECT_MALFORMED to notify XAPI clients when a malformed file/object is detected. Detected malformities: * XAPI definition XAPI value Description ------------------------------------------------- ---------- ----------------------------------------------------------------------- XBOW_MALFORM_NONE 0 No malform reason XBOW_MALFORM_JPEG_CORRUPT 1 JPEG data is corrupt XBOW_MALFORM_RTF_BIN_NEGATIVE_N 2 RTF /bin control word is followed by a negative value of N XBOW_MALFORM_MISSING_HTML_TAG_ATTRIBUTE_NAME 3 Missing HTML tag attribute name XBOW_MALFORM_VBE_ENCODED_SIZE_TOO_LARGE 4 Advertised VBE encoded data size exceeds file size XBOW_MALFORM_VBE_ENCODED_SIZE_TOO_SMALL 5 Advertised VBE encoded data size is less than file size XBOW_MALFORM_VBE_ENCODED_SIZE_MISMATCH 6 Advertised VBE encoded data size does not match actual size XBOW_MALFORM_PDF_HEADER_AT_NONZERO_OFFSET 7 PDF header found at non-zero offset XBOW_MALFORM_RAR_HEADER_AT_NONZERO_OFFSET 8 RAR header found at non-zero offset XBOW_MALFORM_MS04_028 9 MS04-028: Buffer overrun in JPEG processing could allow code execution XBOW_MALFORM_CVE_2018_11212 10 CVE-2018-11212: JPEG divide-by-zero denial of service vulnerability XBOW_MALFORM_PDF_CORRUPT 11 PDF header is present, but PDFlib TET failed to parse as a valid PDF XBOW_MALFORM_INVALID_OFFICEARTRECORDHEADER_RECLEN 12 Invalid OLE2 Data stream OFFICEARTRECORDHEADER recLen field (32-bit image size) 6.0.802 Added XAPI configuration option AddBarcodeBorderPixels. * 0 = off (default) * 1-5 = width of border to add to image * Useful for detecting QR codes that have been intentionally cut off at the border. * If AddBarcodeBorderPixels is set, the position of the barcode in the original image is reported, not the position in the larger image where the border has been added. * ZBar barcode detections report when the detected & extracted barcode crosses the edge of the original image into the optionally added black border. * The detail is called bordercross and its possible values are left, right, top, bottom, or a comma-separated combination of these. * For example: bordercross=left,bottom; means the barcode was detected in the bottom-left corner of the image and required both the left and bottom borders to be present for the barcode to be detected. Moved much of the XAPI-internal (non-engine) logging to the debug log level to reduce the amount of info log level output. Added XAPI option BarcodeBorderGrayscaleColor. * When AddBarcodeBorderPixels is not 0, use this grayscale color value for the border. * The range is 0-255 where 0=100% black and 255=100% white. * This option is used to increase the contrast between the border and the barcode. * This allows ZBar to detect more barcodes at the border. 6.0.779 Added OLE2 'Data' stream extraction of BMP, JPEG, PNG, and TIFF images. * Addresses QR codes stored in the ‘Data’ stream. Added identification of XOR and RC4-encrypted Office 95, 97, 2000, XP, and 2003 documents. The original 3rd-party Cybozulib minixml.hpp code written in 2012 does not support newer encrypted XLSX documents where the EncryptedPackage stream's XML data has a UTF-8 Byte Order Mark (BOM) of 0xEF, 0xBB, 0xBF in front of the usual “ Auto-decrypt key list filepath Re-factored scoprd protocol handling logic. * Fixed regression in logic that handled the keys: protocol parameter. Fixed RAR auto-decrypt regression. Reverted Hex.cpp to the 6.0.637 code level. * The re-factored hex sequence extraction logic in 6.0.730 was found to have a minor problem. * Rather than risk trying to fix it, chose the faster/safer option of reverting the code to the last-known-good version. Minor re-factoring of the XML engine’s base64 extraction logic to improve performance. * Now uses a 64K read buffer instead of reading one byte at a time. If the to-be-extracted MIME message body size returned by the 3rd-party Chilkat MIME parser is the same as the current MIME object's uncompressed size, do not extract it. * This means Chilkat considers the entire object to be a MIME message body. * Thus, extracting it would mean processing the exact same object over and over until an extraction limit is exceeded. Fix for an OLE2 auto-decrypt logic change in 6.0.706 (“Call xcommon_child_autodecrypt_failed after all attempts to auto-decrypt an object have failed”). * The change did not take into account the OLE2 engine’s unique two-pass auto-decrypt design. * The auto-decrypt completes successfully, but the encrypted OLE2 document ends up with conflicting auto-decrypt flags. * The regression was introduced on November 10, 2024. * Impacts encrypted OLE2 documents only. 6.0.637 Fixed failure to extract some URLs. * The original legacy 3rd-party logic was checking for the :// sequence, but instead of treating the string as an invalid URL if that sequence is not present it was allowing URLs like http:foobar.com/path/filename.ext which are not valid. Fixed crash after successful auto-decrypt when the key contains at least one high-ASCII character. * Note that it is very easy to mistakenly and unknowingly copy+paste passwords containing high-ASCII characters as UTF-8 sequences instead. * When this happens the password will still look correct, but the bytes are actually different and thus the password will not work. Fixed Ole10Native stream extraction. * Off-by-one bug in stream name comparison was skipping Ole10Native streams. 6.0.626 Added XBOW_BARCODE_READER_FLAG_ENABLE_MONOCHROME option to BarcodeReaderFlags. * Not enabled by default. * Converts grayscale image to black and white only based on threshold options. To match the behavior of barcode scanners, added logic to pre-pend "http://" to any barcode payload strings that look like URLs. * Specifically, if the payload string consists of: * One or more characters followed by a period (which must be the last period in the string). * The period must be followed by at least two characters. * Followed by two or more characters in the set [A-Z,a-z]. * And contains only alphanumeric characters. Fixed XOLESS engine regression caused by a fix that was intended to address corrupt FAT/miniFAT crashes. * This involved a significant re-factoring of the original 3rd-party cfb.hpp logic. Added .png as a valid extension for ICO files. Added optional per-URL context to extracted URLs. Fixed off-by-one bug in HTML engine's buffered URL search and extract loop. Fixed crash in XARJ engine's original 3rd-party code (found during stress testing). Added file type identification for Universal Data Link (URL) files. Added MaxImageMemory XAPI option to control the maximum number of bytes allowed to be allocated to process an image. 6.0.613 Fixed crash in XQR engine if malloc fails (i.e. out of memory) when processing a "double-QR" (QR within a QR). Applied fix for CVE-2023-40889 to zbar 0.23.90 library. Added logic to prevent an infinite recursion loop in zbar 0.23.90 qr_reader_match_centers(). XQR engine now pre-pends "http://" to extracted barcode payloads that look like URLs without a schema. * This behavior is a catch-22. * Some barcode readers do this, while others do not. XTEXT engine now supports optional extraction of ASCII-art barcodes as PNG images. Added barcode type to ASCII-art object's details. XPDF engine now sets PDFlib TET's outputformat option to utf8. * This forces TET to return the PDF document body text as a UTF-8 string instead of UTF-16 or UTF-32. * This is necessary in order to correctly process ASCII-art barcodes. Significantly improved XQR engine barcode processing logic. Added several new XAPI configuration options to enhance barcode processing. XTEXT engine now treats SVG files the same as HTML with respect to HTML and ASCII-art processing. Added info, warning, and error XAPI logging options. Added MSC (OLESS and XML formats) and MSCIL file type identification. Added XMSCIL engine to extract BMP icons from MSC files. Fixed XTEXT engine's ExtractHex option (it was always enabled). Fixed numerous data validation issues (crashes) in XACE engine's BASE_EXTRACT_DecompressFile() function and its dependencies. Fixed XOLESS engine's legacy parsing logic that was not validating the SectorShift and MiniSectorShift header fields and added validation of block numbers vs. the file size. Disabled all remaining printf() debug output. * Such output interferes with the scoprd and XRay protocols. Fixed a crash in the XISO9660 engine's legacy code. * Found during corrupt data stress testing. Fixed a crash in the XEAPPX engine's legacy code. * Found during corrupt data stress testing. If the input APPX XML contained Name=" but the closing " was missing, the bug caused a NULL pointer reference. Fixed a crash in the XACE engine's legacy code. * Found during corrupt data stress testing. * Added bullet-proofing logic to the BASE_EXTRACT_DecompressFile() function and its dependencies. 6.0.572 Disabled SQ code scanning in zbar library due to performance issues. Moved load of configuration options KEYNAME_PORT, KEYNAME_BINDIPV4, and KEYNAME_MAXSCOPRDCONNECTIONS to LoadConfiguration() in scoprd.cpp. * Previously these global connection-related variables were being loaded via the COptions constructor which was too late for the first connection. Improved logging; added LOGFLAG_INFO, LOGFLAG_WARNING, and LOGFLAG_ERROR. Re-factored LZMA SDK logic in order to support double-quote characters in 7ZIP passwords. Added SOReuseAddr and SOReusePort options to scoprd configuration. BarcodeReaderFlags * 0x00000004 - If set, the XQR engine will generate and barcode scan a 2nd, inverse-contrast version of the original image * 0x00000008 - If set, the XQR engine will apply/use the MinBarcodeWhiteThreshold and MaxBarcodeBlackThreshold options OCRFlags * 0x00000004 - If set, the XOCR engine will apply/use the OCRMinWhiteThreshold and OCRMaxBlackThreshold options Added scannum=1 or 2 to object details when at least one barcode is found * Indicates which barcode scan (1st or 2nd) detected at least one barcode 6.0.563 Critical fix for semi-infinite loop in iCalendar file type identification logic for files that exceed the max extract ratio (e.g. 500:1). Optimized performance of iCalendar and PowerShell file type identification logic. Added JPEG 2000 file type identification (JP2, JPM, JPX, MJP2). 6.0.430 Re-factored XONE OneNote engine to support auto-decrypt key discovery. Improved PowerShell file type identification. [VR226393] When decoding hex-encoded strings, allow commas to separate each 2-character hex value (e.g. '0d,0a,45,51'). [VR226393] Fixed rare infinite loop in ICalender parser. Added MFS, HFS, and HFS+ file type identification. Added logic to the XONE engine to identify encrypted OneNote sections. * The XONE engine does not yet support auto-decrypt for OneNote documents containing one or more encrypted sections. Increased CxImage library's MAX_DIB_SIZE from 25MB to 50MB to support 2x larger images of any/all types. Use .mp4 as the data-type-correct extension for files identified as either ISO-14496-MP41 and ISO-14496-MP42. Fixed crash when transient key list file specified by a "keys:" parameter in the scoprd protocol has a combined/cumulative key length of over 256 characters. Removed scoprd_filters.csv from scoprd installer RPM. 6.0.412 Added XONE OneNote extraction engine. * Extracts strings, URLs, and embedded files. Added support for buffer-based file type identification for ICalendar/VCalendar files. Fixed XZIP engine crash caused by very specific values in the compressed file size field of the ZIP header for files that are both STORED and AES-encrypted. Added .wmz (gzip'd WMF) as a valid GZIP extension. Fixed XBCRYPT engine logic such that it will identify BCrypt files even when they can not be auto-decrypted. Enhanced XONE OneNote extraction engine. * XONE now supports the ExtractText=1 option and behaves the same as the XOLESS and XPDF engines when it is enabled. * That is, all strings are extracted to a single file, with each string terminated by a line-feed character. * This addresses OneNote files containing large numbers of strings and thus unnecessarily exceeding extraction limits. Added MaxDepthToHash option to scoprd.conf. 6.0.394 Improved TAR file type identification and parsing. Convert '#' characters in filenames to underscores because they can't be used directly in a URL. Added MaxHeapSize XAPI configuration option. Replaced all malloc, realloc, and free calls with equivalent centralized heap management functions. Heap management includes MaxHeapSize enforcement and peak heap usage logging. Added RAR major format version to RAR object details. Added extraction of Javascript code blocks from SVG XML documents (VR211891). Added logic to ignore possible auto-decrypt keys that are valid base64 sequences of MaxDiscoverableBase64KeySize or more characters. Files initially identified as XBOW_TYPE_JAVASCRIPT are no longer adjusted to XBOW_TYPE_UNKNOWN when duktape fails to compile the file as JavaScript. Fixed BASE64 discovery code such that it correctly ignores "" (empty) quoted sequences. Fixed XRAR engine to use xcommon_wchar_t_to_utf8 to convert Unicode RAR filenames to UTF8. The base64 decoder in the XText engine now supports decoding reversed base64. * This means equal sign charcters are valid at the beginning of an otherwise valid base64 sequence. Fixed PDF processing cases where the master key is set, but the user key is not. Re-factored the libpng source code in order to remove all use of the dangerous and problematic setjmp/longjmp functions. Added logic to extract an HTML entity-decoded version of any HTML file that contains at least one HTML entity. Re-factored the X7ZIP engine's LZMA decompression logic to minimize heap usage. Added MinDiscoverableKeySize XAPI option. * Minimum auto-decrypt key size to discover. * Highly recommended that this option be set to 3. * Bad actors rarely use 1 or 2-character keys because it is relatively trivial to brute-force all such key combinations. Added MaxDiscoverableBase64KeySize XAPI option. * Maximum auto-decrypt key size for alpha-numeric keys that may also simply be part of a base64-encoded block of data. * Highly recommended that this option be set to at least 32. * This will generally cause auto-decrypt key discovery to ignore any large sequences of base64-encoded data. * This option is critically important in order to discover the correct auto-decrypt key if the key follows a large amount of base64-encoded data. Fixed infinite loop regression in CxImage PNG library caused by setjmp/longjmp re-factoring. Fixed logic error in original implementation of decrypted PDF child generation. * If a PDF's auto-decrypt key is found, that's enough to generate the Decrypted PDF Child. * Original implementation required at least one child file to be auto-decrypted + extracted first. Fixed rare infinite loop in ACE engine's auto-decrypt logic. Added support for four new scoprd.conf AutoDecryptFlags to begin dealing with one and two-character encryption keys: * 0x00000008 - Add the 62 one-character alpha-numeric keys a-z, A-Z, 0-9 to the transient key list * 0x00000010 - Add the 100 two-character 0-9 key combinations to the transient key list * 0x00000020 - Add the 676 two-character a-z key combinations to the transient key list * 0x00000040 - Add the 3,844 two-character alpha-numeric a-z, A-Z, 0-9 key combinations to the transient key list Additional improvements to make sure infinite loops do not occur in the RIFF and PNG parsers. Added VHDX engine. If a BZIP2, GZIP, or XZ file's extension indicates that it is expected to contain a TAR (e.g. .tbz, .tar.bz, .tgz, .tar.gz, .txz, .tar.xz), append .tar to the extracted child object's name. * There is no guarantee that .tar will be correct, but the hint is now preserved. Stop processing a TAR if it contains two consecutive headers containing nothing but 0x00 and/or 0x30. Allow TARs with an all-null magic field. Increased size of HTML parse buffer from 1MB to 10MB to deal with recent malicious HTML samples. Added support for converting SVG (XML-based vector images) to PNG. Added SVG to list of file types supported by XQR/zbar engine. Added SVG to XAPI configuration option BarcodeReaderFlags. Fixed a crash that can occur when parsing a long non-null-terminated XML line. Fixed corrupt data vulnerability in the XLHA engine. * The XLHA engine reads the LHA header's original size field as a 32-bit signed value. * If the high bit is set, the original size can be negative. * The original code did not expect such a case and behaved incorrectly. * Now when this happens, the corrupt LHA header and associated file is skipped. Added default value documentation to default scoprd configuration files (no changes to any values). Added XAPI per-engine configuration option ExtractDeleted. * If set on an engine that supports it (e.g. XMBR), extracts deleted files. Added MaxExtractVirtualDiskSize XAPI configuration option. Added MaxSizeToHash XRay/scoprd configuration option. Added FAT as a possible secondary file type for MBR file type identifications. 6.0.360 Added logic to extract and decode rot13-encoded data as in VR199717. Added MKV (Matroska audio/video) and ONE (OneNote section/page) file type identification. Extract metadata from WAV INFO list chunks: IART, ICMT, ICOP, iurl. Added .onepkg to valid CAB nce extension list. 6.0.355 Fixed crash in CxImage 7.0.2. * CxImage is a 3rd-party library which has not received an update in over 11 years. * The CxImageGIF::out_line() method in ximagif.cpp and related code in ximaiter.h does pointer arithmetic with a variable that could be NULL. * The crash is generally triggered by corrupting the GIF image frame's biWidth and/or biHeight header values such that they are higher than they should be. Fixed regression/crash in XZIP engine when identifying some relatively obscure ZIP-based sub-types. 6.0.354 Added logic to XZIP engine to re-process encrypted ZIP children that failed to auto-decrypt on the first pass. * Addresses malware campaigns that intentionally place the decrypt key in email attachment(s) such that the key does not get discovered until after the first unzip pass. Added file type identification for appx, appxbundle, eappx, and eappxbundle. Added XEAPPX engine. * XEAPPX can only extract non-encrypted files. Added PDFlib PLOP support to XPDF engine. * The XPDF engine now converts and extracts encrypted PDFs to functionally equivalent non-encrypted PDFs. Added QOI file type identification and engine to extract QOI images as BMPs. (Black Duck) Upgraded nlohmann json-develop json.hpp 3.9.1 to 3.10.4. (Black Duck) Upgraded XZUtils 5.2.3 to 5.2.5. (Black Duck) Upgraded pcre2 10.33 to 10.37. (Black Duck) Upgraded libcurl 7.64 to 7.80. (Black Duck) Upgraded libxls 1.5 to 1.6.2. (Black Duck) Upgraded libzstd 1.4.3 to 1.5.0. (Black Duck) Upgraded libjpeg to version 9d. (Black Duck) Upgraded unrar 5.8.3 to 6.1.3. (Black Duck) Upgraded libressl 3.0.2 to 3.4.2. No longer extract raw binary VERSIONINFO resource from EXEs; redundant since VERSIONINFO gets extracted in text/RC (Resource Compiler) form. Added appx, appxbundle, eappx, eappxbundle, qoi, tvg, vbe, vba, and xap file type identification. Added eappx/eappxbundle engine. Added qoi engine. Added PDFlib PLOP 5.4p4 to PDF engine to convert encrypted PDFs to their decrypted equivalent PDF for AV scanning purposes. Added .lz as a valid LZIP extension. Fixed extremely rare infinite loop in RIFF parser. Fixed 7ZIP engine’s handling of LZIP, LZMA, and PPMD archive files; re-factored/corrected 7ZIP engine’s auto-decrypt logic to handle a wider range of encrypted 7z cases. Fixed BCRYPT engine’s auto-decrypt logic. Added support for detecting Log4Shell attack strings; added DetectLog4Shell XAPI configuration option (default is off). (Black Duck) Upgraded LZMA SDK 19.00 to 21.07. (Black Duck) Upgraded unrar 5.8.3 to 6.1.3. Enabled X7ZIP engine's support for all 7Zip compression methods that are available via LZMA SDK 21.07. Added XUDF engine to support extraction of Universal Disk Format images. Added MP3, PGF (Progressive Graphics File), PNM (Portable AnyMap and associated sub-types), and Egress SWITCH file type identification. Improved batch, Perl, JS, and VBS file type identification reliability. Fixed relatively rare (1:256) auto-decrypt bug in XZIP engine. Added "msftremoteobjecttargetusesie" boolean flag to JSON response; set to 1 when an XML object contains a Microsoft-specific remote object Target URL that ends with either .htm! or .html!; useful for identifying Follina attacks. Improved scoprd license key log output to make it clear when a license key has or has not expired and when a license key never expires. (Black Duck) Upgraded LZMA SDK 19.00 to 21.07. (Black Duck) Upgraded unrar 5.8.3 to 6.1.3. (Black Duck) Upgraded XZUtils 5.2.3 to 5.2.5. Improved JavaScript file type identification. * Specifically, to deal with use of a single line of JavaScript containing document.write(window.atob('')) Added support for extracting base64 sequences following window.atob where the base64 string is delimited by single quote characters. Added "location.replace" and "window.location" to JavaScript file type identification keyword list. Re-added "let" as both VBS and JS file type identification keyword. Fully enabled the XZ engine on non-Windows platforms. Added logic to detect when an HTML tag is missing an attribute name between LWSP and '='. * When detected, sets XBOW_FLAG_MALFORMED and XBOW_FLAG_SUSPICIOUS and malformoffset and malformreason to "Missing HTML tag attribute name". (Black Duck) Upgraded duktape 2.6.0 to 2.7.0. Improved HTML file type identification. Improved CxImage malformed image reporting infrastructure. Disabled XBOW_MALFORM_MISSING_HTML_TAG_ATTRIBUTE; too many FPs. Added logic to XPDF engine to call PDFlib TET APIs to extract all URL annotations when the ExtractAbsoluteURLs option is enabled. 6.0.294 Temporarily disabled ZIPX compression method 95 (XZ) support due to discovered instability. Added file type identification and details for the B1 archive file format. Added "pkcs7encrypted" and "pkcs7signed" properties for S/MIME objects. Added APK as a ZIP file sub-type. * This also fixes issue with APKs being identified as JARs. Added XAPI option DisableFastBase64=0|1. * If set to 1 on the xtext engine, a hardware-agnostic base64 decoder (same one used in Chromium) is used instead of the default, faster, Intel 4th gen, AVX-based assembly-language base64 decoder. Skip metadata extraction for text files if none of the metadata extraction options are enabled. * This is a performance enhancement for text files. Set MAXIMUM_POWERSHELL_LINES to 1000. * Stop analyzing text files for PowerShell keywords at that point. * This is a performance enhancement for large text files. The filename given to base64-encoded data URIs extracted from HTML documents now includes the extension associated with the URI's media type. Added CHECKEXTRACTLIMITS calls to all RAR extraction I/O loops to ensure the timeout limit is enforced. Fixed RAR 4.x auto-decrypt. Upgraded to PDFlib TET 5.3p3. Re-introduced split ZIP identification. Added .ntx as a valid yEnc extension. Added ZIP:PPAM file sub-type identification. Improved JavaScript file type identification. Added PDF auto-decrypt support. Re-factored the ZIP engine's auto-decrypt loop such that it runs through the very fast 1-byte password validation check first, then only decrypts the entire file when that check passes. * This is an auto-decrypt performance enhancement. * On average the 1-byte validation will match 1 out of every 256 keys attempted. * Only then compute the CRC32 of the decrypted data. * If the CRC32 matches, we know we found the correct key. * Otherwise the auto-decrypt loop continues. Completed PDF auto-decrypt support. PDF engine now extracts attachments and annotations in addition to images and document body text. Increased TAR extraction I/O buffer size from 8K to 64K to improve performance; 8x fewer loops on large TAR files. Added hexchars and hexvalues counts to JS object details. Added support for extracting QR codes out of TIFF images. Added WOFF and WOFF2 file type identification. Reduced JS file type identification FPs. Upgraded CentOS 7 build environment to gcc 9.3.0. Required for CxImage TIFF library (C++11). Fixed XBIFF engine's extraction logic. Added JavaScript extraction (including via auto-decrypt) to XPDF engine. Re-factored icon resource extraction in XEXE engine to make it more reliable and architecturally consistent. Upgraded to libmspack 0.10.1alpha; high priority per BlackDuck. Added .xll as a valid NCE for the DLL file type. Fixed relatively rare temporary file leak in XOLESS engine. Disabled CxImage TIFF engine's warning and error handlers because they generate very undesirable message boxes on Windows and stderr output everywhere else. 6.0.226 Added support for extracting document metadata as per-object details from Office (OLE2 and new ZIP-based format) documents. Added EnableMetadata=0/1 option to scoprd.conf. * This option can be used to completely disable scoprd-metadata.csv output. When xapi.conf GDPRCompliantMetadata=1, OLE2, PDF, and DOCX engines only extract non-security-sensitive metadata. Globally added and enforced default MIN_BASE64_FILE_SIZE == 10 to prevent short base64-looking, but not actual base64 sequences from getting extracted. Added XAPI option MinExtractBase64Size to override the MIN_BASE64_FILE_SIZE 10 default if desired. Added per-object XBOW_FLAG_WRONG_EXTENSION bit-flag. Set when the file extension does not match the primary file type. Added per-object secondary file type and sub-type. * These are set when a compound file is identified (e.g. BMP+RAR, GIF+ZIP, JPEG+ACE). Fixed auto-decrypt of RAR files with encrypted filenames. Added schemas.openxmlformats.org to scoprd excludedurllist.txt. If an extracted URL matches a URL in the excluded URL list, it is no longer added to the per-object URLs field. * Previously excluded URLs were extracted, but not followed. * Now they are neither extracted nor followed. Fixed Windows EXE VersionInfo resource extraction. Improved scoprd's xid.py CSV output. Fixed minor scoprd bug in non-Windows EraseDirectory logic. * It was not ignoring the '.' and '..' directory entries. * Although this bug was causing many file remove() errors to be logged, it was not possible for the bug to do any damage to the file system. Fixed scoprd auto-decrypt key management system. * The scoprd logic now uses the same logic as Scopr XRay. Added scoprd.conf option ShowUnidentifiedFiles. * If set to 1, the output will include unidentified files. * Default is 0 (i.e. unidentified files are hidden, thus improving processing times). The password discovery loop in the MIME engine now correctly captures the first and last words in the message body if the word is not followed by a delimiter. Fixed malformed JSON output when an unidentified file is squelched due to the new ShowUnidentifiedFiles option. Added "xbowflags" to JSON output. * It is more efficient to manage this one 32-bit set of bit-flags rather than each bit separately. * No change to the existing JSON output of the individually-named bit-flags. Added "attrs" to JSON output. * This is the same as the "attributes" field except represented as an unsigned integer instead of a hex string. * This simplifies queries for this field. * The "attributes" field remains unchanged for now, but may be removed in a future release. Removed the redundant/implied leading period from the "datatypeextension" field. Now enforcing XAPI MinExtractedBase64Size configuration option for JavaScript and XML. Added a per-object GDPRDetails string. * If the XAPI GDPRCompliantMetadata option is disabled (0), this field contains any extracted metadata that is considered security-sensitive. * If the XAPI GDPRCompliantMetadata option is enabled (1), this field will be empty (i.e. security-sensitive data is not extracted). Added "gdprdetails" to JSON output. * Added a similar field to CSV, HTML, and XML output. * Same semi-colon-delimited format as "details". Added extraction of custom properties from OOXML documents as per-object "gdprdetails" because there is no way to determine which custom properties are security-sensitive. * Thus, all custom properties are security-sensitive. * The syntax of extracted custom properties is "customproperty_=;". Treat the last underscore in a filename as a possible double-extension separator (i.e. the next-to-last extension is assumed to follow the underscore). Improved OOXML document file type identification. Disabled XZIP engine's ZIP 1.0 logic since there is no reliable way to tell when encrypted ZIPs were created with a 2-byte ZIP 1.0 password validation or 1-byte. * Thus auto-decrypt is again susceptible to a 1 in 256 chance of using the wrong auto-decrypt key. * The will be addressed in a future release. Replaced error-prone manually-generated scoprd JSON output with nlohmann JSON library. Added --no-clobber switch to cp command when installing scoprd *.conf files. * Thus existing scoprd.conf and xapi.conf files will not get overwritten during installation. Added getconf:[.conf filename] command to scoprd protocol. * Returns the default /usr/local/scopr/scoprd/xapi.conf or specified XAPI configuration file in JSON form. Added getconf.py sample script to scoprd installation. Added xapi_docsonly.conf sample configuration to scoprd installation. Added .tif as a legitimate TIFF extension. Moved output of successful auto-decrypt key to gdprdetails field. Fixed ZIP engine's handling of STORED AES-encrypted files. * It was extracting 10 more bytes than it should because it was not accounting for the 10 bytes of AES authentication data that follows the raw file data. Added logic to extract custom properties from OLE2 DocumentSummaryInformation streams. Added logic to MIME extraction engine to identify text/html utf-8 attachments containing a contiguous sequence of at least 5,462 high-ASCII characters as Suspicious:CVE-2020-16497. Added PDF custom property extraction. Fixed custom property extraction for OLE2 documents created on non-Windows platforms. Fixed a PDF extraction bug that would sometimes cause PDF processing to stop prematurely. Changed the custom property name prefix to "cp_". Added KeyLengthWeights option to scoprd.conf. * Scales the auto-decrypt score for each key based on the key's length. Added KeyContextKeywords option to scoprd.conf. * Comma-delimited list of lowercase strings that are commonly expected to be within MaxKeyContext keys of the actual auto-decrypt key. * The auto-decrypt score for keys surrounding keys that contain one or more of the context keywords is increased by the inverse of the distance betweeen the two. Added MaxKeyContext option to scoprd.conf. This is the maximum number of previously discovered auto-decrypt keys on either side of an auto-decrypt key to consider more probable (using the inverse of the proximity) than others. Improved password discovery logic to ignore all HTML tag sequences and thus only look for passwords between HTML tags. * This considerably reduces the average number of auto-decrypt attempts needed before finding the correct key. The ZIP engine was artificially limited to 80 character passwords (hard-coded in the legacy InfoZIP logic). * Increased this to 256 characters - which is the maximum auto-decrypt password length scoprd is capable of supporting. Discovered that the legacy InfoZIP implementation limits the number of bytes at the end of the ZIP file it searches for the central directory structure. * Its default behavior is to scan only up to 66000 bytes at the end of the file. * ZIP examples exist where extraction fails because of this. * Changed the logic so that it will now search the entire ZIP file. scoprd no longer changes the scoprd-metadata.csv filename to "x" when the XAPI GDPRCompliantMetadata option is enabled. The XZIP engine now supports the zstd compression method. Added a new XZSTD engine that supports extracting .zst zstd-compressed files. Added compressionmethodid detail to ZIP objects; useful for recognizing ZIPs using unsupported or undocumented compression methods. Added support for recognizing undocumented ZIP compression method 92. * Since ZIP compression method 92 is undocumented, calling it "duplicatesha1" because that is literally all it is - the 20-byte SHA1 of a file that is present earlier in the ZIP. Added per-object "xbowproperties" 32-bit bit-field value to scoprd JSON response. Fixed OLE2 Unicode custom property extraction crash. Added prototype BIFF engine for extracting images and URLs from OLE2 XLS Workbook streams. Added a "keys:" parameter to scoprd protocol. * Allows scoprd clients to pass a line-feed-delimited list of keys to try during auto-decrypt operations. Added a MinBase64Size XAPI option to control the minimum length (in bytes) of valid base64 sequences to extract. Always set XBOW_FLAG_ENCRYPTED and XBOW_FLAG_NO_KEY for PGP and MCrypt files since auto-decrypt is not supported for either. The ZIP engine now reports the existence of encrypted children even if they fail to auto-decrypt. Added .asc and .sig as valid PGP extensions. Fixed OLE2 EncryptedPackage processing when auto-decrypt fails. Added initial minimal support for reporting new per-object flag XBOW_FLAG_HAS_ENCRYPTED_CHILDREN. * This flag is very useful in queries as it identifies the container files that have encrypted content - as opposed to the encrypted files themselves which are identified by XBOW_FLAG_ENCRYPTED. Added a hard-coded 15 second timeout to scoprd's ComputeHashes function. * This prevents huge files from tying up a scoprd instance for a lengthy period of CPU-bound time. Fixed an infinite loop in the RAR engine's ReadHeader50 and ReadHeader15 logic when auto-decrypting RARs with encrypted headers. Completely disabled the CXJavaScript::extract_string_fromcharcode method due to it causing an infinite loop. * This method will be re-enabled later after an appropriate fix has been verified. Fixed infinite loops in the XTEXT engine's XML parsing logic that extracts various types of scripts out of XML documents. Added support for the "keys::" protocol parameter. Added a key list filepath parameter to scoprc: "scoprc ". * The key list filepath need not exist. * Scoprd ignores (but logs) any key list filepaths that it can not access or open. Fixed handling of the MinExtractedBase64Size XAPI option when extracting base64 sequences out of XML documents. * The option is now enforced correctly in this scenario. Added scoprd configuration options KeysFilepathPrefix and ProcessFilepathPrefix to lock down the locations that are allowed to be used in the scoprd protocol for the respective fully-qualified filepaths. * These options should be used to prevent unexpected and malicious filepaths from being opened by scoprd. Fixed an infinite loop in the AVI and WAV (a.k.a. RIFF) file type identification logic. * In rare cases it would get stuck trying to read the next RIFF chunk (a 4 byte RIFF chunk ID) when the EOF has already been reached. For encrypted OLE2 documents, XBOW_FLAG_HAS_ENCRYPTED_CHILDREN is now set on the EncryptedPackage stream's parent object - which is typically the OLE2 document itself. Removed XBOW_FLAG_NO_KEY from processing of raw EncryptedPackage streams - where no auto-decrypt is even attempted. Fixed VDI file type identification. Fixed 1-byte buffer overflow in ZIP engine's auto-decrypt logic if the key to try is 256 characters long. Set XBOW_FLAG_HAS_ENCRYPTED_CHILDREN on the OLE2 EncryptedPackage stream's parent - which is typically the OLE2 document itself. Added slx, slxc, slxp to allowed ZIP extensions. Added mexw32, mexw64 to allowed DLL extensions. After a successful auto-decrypt of an OLE2 EncryptedPackage stream, do not set XBOW_FLAG_ENCRYPTED on the resulting DecryptedPackage stream. Do not set XBOW_FLAG_NO_KEY when processing raw OLE2 EncryptedPackage streams - where no auto-decrypt is even attempted. Fixed infinite loop in WAV and AVI (RIFF) file type identification. Fixed possible infinite or very long loops in PDF engine. Fixed VDI file type identification. Fixed high impact, but extreme end-case buffer overrun issue in ZIP auto-decrypt logic. Added diagcab, nupkg, nupack, and xdp as allowed extensions. Added auto-decrypt performance metrics: autodecryptattempts, autodecrypttime, autodecryptaverage, autodecryptslowest, autodecryptfastest, and keynotfound. Fixed an infinite loop in the XTEXT engine's PowerShell parser. Added ACE auto-decrypt key to GDPR details. Added ALZip auto-decrypt key to GDPR details. Added XAPI option MaxAutoDecryptAttempts. Fixed extraction of LZMA and LZMA2-compressed DAA, ZIP, and 7ZIP files. Improved malformed PDF file type identification and processing. Upgraded to PDFlib TET 5.2.0. Includes fixes for numerous PDF parser stability issues. Added XBOW_SUBTYPE_PAC to support the JS:PAC (Proxy Auto-Configuration) file type. Fixed small buffer overflow in RIFF parser. Disabled MP3 file type identification due to its inaccurracy. Needs work. Updated TET logging options in scoprd. Improved RAR file type identification to identify RAR headers at offsets up to 8K - 7. Upgraded to PDFlib TET 5.2.10. Added XBOW_LOGGING_PDFLIB_TET XAPI configuration option. If set, PDFlib TET logging is enabled in the XPDF engine. Added XBOW_FLAG_MIME_BODY_PART to indicate when an extracted object is a MIME body part as opposed to a MIME attachment. scoprd.conf options ShowUnidentifiedFiles and ShowUnidentifiedStreams can now be set to 2 to only show files or streams that are both encrypted and unidentified in the JSON response, omitting all other unidentified objects. If auto-decrypt fails for an encrypted object, XBOW_FLAG_HAS_ENCRYPTED_CHILDREN must still be set on the parent object. Fixed infinite loop in 3D Studio Max (.3ds) file type identification. Fixed setting of XBOW_FLAG_HAS_ENCRYPTED_CHILDREN. Fixed regression introduced on Sept. 6, 2020 where RAR auto-decrypt only worked for the first encrypted file if all subsequent files use the same key. Show/enumerate encrypted RAR files even if they cannot be auto-decrypted. Added support for detecting suspiciously long underscore sequences as double-extension separators. Added logic to set the auto-decrypt key on OLE2 DecryptedPackage objects. Improved asterisk-delimited auto-decrypt key discovery logic. Fixed how the X7ZIP engine sets the XBOW_FLAG_HAS_ENCRYPTED_CHILDREN flag. Added logging of PDFlib TET open_document I/O to help track down the cause of a rare infinite loop. Added HTML entity decoding for strings extracted out of HTML that are expected to be URLs. When extracting EXE resource images via the XEXE engine, if a RESTYPE_BITMAP resource has a PNG header, use .png for the object's extension instead of .bmp. Upgraded to PDFlib TET 5.3.0 which includes a new timeout option. Added a timeout check to XPDF engine's PDF body text extraction loop that calls PDFlib TET's get_text() API which fails to return an empty string in some rare cases. Added getver: option to scoprd protocol to have scoprd return its version number. Added horizqtr=1-4 and vertqtr=1-4 details of first bar/QR code symbol. Added size of first bar/QR code symbol's bounding box to object's details. Switched to case-insensitive comparison of special OLE2 stream names like Ole10Native to match MS-Office behavior. Added initial support for extracting ZIPX compression method 95 (XZ). Added call to CHECKEXTRACTLIMITS in pdf-text extraction loop that calls PDFlib TET get_text(). * Prevents long or infinite loops if get_text() fails to return an empty string in a timely fashion (extremely rare, but happens). Improved URL extraction from HTML attributes. * Specifically malicious URL obfuscation cases where bytes in the range 0-32 come immediately before or after the URL (browsers ignore these bytes). Due to malicious encrypted files, the characters ‘?’ and ‘@’ are no longer treated as password delimiters by the key discovery logic. If the MIME child object's Content-Disposition is not "attachment", set the child object's XBOW_FLAG_MIME_BODY_PART flag. Set some reasonable defaults for the two XAPI base64 extraction options in case the options are not specified via an XAPI config file. Improved ZIP auto-decrypt reliability. * Addresses rare cases where an incorrect key gets used. Added
base64-encoded objects and their names from XML MS-Word documents. Upgraded to PDFlib TET 5.3p2. Various NCE updates. 5.0.1309 Added Bitcoin address extraction support to XAPI. 5.0.1301 Added XBOW_TYPE_RPMSG, associated Restricted Permission Message (RPMSG) file type identification and extraction engine. Note: Auto-decrypt of the DRMContent stream is not supported. 5.0.1291 Added XBOW_TYPE_SYLK and associated Microsoft Symbolic Link (SYLK) file type identification. 5.0.1274 Added ALZip auto-decrypt support. 5.0.1270 Added XBOW_SUBTYPE_FPX and associated OLE2 object subtype identification for the Kodak FlashPix file format. 5.0.1244 Added Quick Response (QR) code extraction engine. 5.0.1235 Added XAPI configuration options: * MaxExtractTotalSize * MaxTotalItems 5.0.1233 Added XBOW_TYPE_PYC and Python bytecode file type identification. 5.0.1228 Added XAPI configuration options: * GDPRCompliantMetadata * OLESSParallelDecryptedPackage 5.0.1216 Added Direct Access Archive (DAA) extraction engine. 5.0.1212 Added XBOW_TYPE_DAA and Direct Access Archive file type identification. 5.0.1199 Added decode and extract of JavaScript string arrays via Text Extract Hex option. 5.0.1197 Added RTF metadata and URL extraction. 5.0.1192 Fixed XRay's reported processing elapsed time. 5.0.1177 Improved RTF hex data extraction. 5.0.1173 Added xbowddd.cgi query string parameters documentation page. 5.0.1169 Added extraction of base64-encoded objects out of window.atob("") JavaScript code. Added extraction of