PII Tools Release Notes

v5.5.0, 23 March 2026

v5.5.0 brings integration with Microsoft Purview. You can now apply Microsoft Purview sensitivity labels to your documents directly from within PII Tools.

This new remediation feature lets you combine the power and flexibility of PII Tools' scanning and investigative workflows with Microsoft's complex world of data governance and DLP.

  • โœ… Label documents in Microsoft Purview

    If your organization has an active Purview license and you want to label sensitive documents inside your Microsoft tenancy, get in touch with our PII Tools support to help you set up this new integration feature.

  • ๐ŸŽฏ Improved accuracy of the ID and passport detector.

  • ๐ŸŽฏ Improved accuracy of name detector.

  • ๐ŸŽฏ Improved accuracy of home address detector.

  • ๐ŸŽฏ Improved accuracy of SSN and tax ID detectors.

  • ๐ŸŽฏ Improved accuracy of routing number and bank account detectors.

  • ๐Ÿ”’ Regular security update of internal dependencies.

  • ๐Ÿ”ด Fix parsing error on PDFs that contained PDF annotations but no PDF text.

  • ๐Ÿ”ด Fix parsing error on emails with only BOM character as their complete email body.

  • โšก Several minor UI fixes, updates and improvements.

v5.4.3, 10 February 2026

v5.4.3 brings important accuracy fixes and improvements, most notably to ID scans in multi-page documents.
  • ๐ŸŽฏ๐Ÿ”ด Improved accuracy (reduced both false positives and false negatives) of ID scans, Passport scans, Residency Cards and scanned Driver Licenses.

    This is a critical fix for a regression introduced in v5.4.1 that caused multi-page PDFs to sometimes produce incorrect ID detections and miss actual IDs.

  • โœ… Deduplicate email addresses entered into the Root Folder of Exchange scans.

    Previously, a Root folder value of john@acme.com,john@acme.com would scan John's mailbox twice. Starting with v5.4.3, PII Tools deduplicates such comma-separated Root folder values. This new behaviour is in line with how root folders are already being deduplicated in other connectors and in Root Folders uploaded from a file.

  • โœ… Extract the "Last modified" date from emails extracted from PST archives, allowing filtering of such emails by date in Analytics.

  • โšก Optimize size of DB activity logs, for a smaller installation footprint.

  • ๐Ÿ”ด Fixed boot-up error when the LOCK_OUT_AFTER_FAILED_LOGIN_ATTEMPTS server configuration option was set.

  • ๐Ÿ”ด Fixed bug that caused scans that were launched by uploading hundreds of Root folders from a file to fail.

  • ๐Ÿ”ด Fixed bug where remediations that were actively running when PII Tools server was restarted could become "stuck".

    Beyond fixing the original bug, v5.4.3 will also automatically check for such "unfinished" remediations on startup and resume them. No user action is required after the upgrade.

  • ๐Ÿ”ด Several smaller improvements to scanning accuracy, robustness and the web UI.

v5.4.1 = v5.4.2, 22 January 2026

This v5.4.1 release (also re-released as v5.4.2, both are identical) focuses on efficiency of document remediation. Both Secure Erase and Document Redaction are now 3x times faster. Redaction in particular produces up to 20x smaller files, especially with large PDFs.

There is also the usual round of improvements to PII accuracy, UI workflow and bug fixes.

  • โœ… Redesign Drill-down and Risk Summary reports.

    The popular Drill-down and Risk Summary reports underwent a major facelift. These reports are now easier on the eye while retaining their packed information content.

  • โšก Speed up remediations.

    v5.4.1 optimized the performance of document remediation. This means faster processing of documents during Secure Erase and Redact, at a cost of slightly increased server RAM footprint.

    The speedup is 3x, but if you rely on redacting or erasing large numbers of documents, reach out to support@pii-tools.com to optimize this 3x factor further.

  • โœ… Reduce size of redacted PDFs.

    PII Tools will now automatically detect documents that are primarily black-and-white after redaction, and automatically compress the redacted output using JBIG2. This leads to 2-50x smaller PDF output. This space saving is critical in workflows that rely on redacting emails with PDF attachments, and other PDF-heavy repositories.

  • โœ… Scan HEIC image format.

    .heic is a popular photo format, especially in the Apple ecosystem.

  • โœ… Don't scan O365 system drives and libraries.

    Previously, PII Tools would scan through the storage when scanning OneDrive, including hidden (system) drives and libraries.

    In v5.4.1, PII Tools automatically detects and ignores such system drives and libraries.

  • ๐ŸŽฏ Automatically detect MSG files that are in fact EML.

    Two popular formats to store emails (incl. attachments) as files are MSG and EML. Some 3rd party tools will store emails in the EML format while giving the file an incorrect .msg extension.

    PII Tools v5.4.1 now automatically detects such mis-named email files and scans them appropriately.

  • ๐ŸŽฏ Improved accuracy of SSNs and tax IDs.

  • ๐ŸŽฏ Improved accuracy of phone number detector.

  • ๐ŸŽฏ Improved accuracy of medical MBI detector.

  • ๐Ÿ”ด Fixed bug where MSG archives could not be deleted from OneDrive during Secure Erase.

  • ๐Ÿ”ด Fixed bug with linking nested scan IDs in Person Cardsยฎ report.

  • ๐Ÿ”ด Clearer error message for files that FAILED due to Microsoft's Information Rights Management (IRM) and Azure Information Protection (AIP).

  • ๐Ÿ”ด Fixed sniffing of file type from extension-less XLSX documents inside nested file archives.

  • ๐Ÿ”ด Removed a few rarely-used parameters from the "Launch Scan" web form. The UI form is now more streamlined, while these removed parameters are still available through the API for advanced users.

v5.4.0, 7 December 2025

v5.4.0 comes with full support for email redaction. Plus many accuracy improvements.
  • โœ… Support redacting PII within HTML, including inside "rich text bodies" of emails.

    This feature enables new workflows for clients who scan and redact emails.

    PII Tools previously redacted emails by redacting the plaintext email body and attachments. This means the redacted email showed "without formatting" in email clients, as plaintext only.

    v5.4.0 implements full HTML redaction, which means emails are now redacted completely โ€“ PII redacted inside the rich-text email body, inside the plain text email body, headers and attachments. The outcome is a surgically redacted email with the same structure as the original email, natively viewable in Outlook and other email clients.

  • โœ… Support automated deployment of MacOS agents via JamfPro.

    To scan fleets of MacOS devices, PII Tools now comes with JamfPro scripts to automate mass-deployment. The scripts support both MacOS running on the older Intel architecture (x86_64) as well as MacOS running on the newer "Apple Silicon" ARM architecture (M1, M2, M3, M4 etc).

  • โœ… New PII type "Residency card" (under Nationalโ†’Scan ID).

  • ๐ŸŽฏ Improved PII detection inside scanned IDs (images).

    This includes more accurate extraction of PII from various ID scans, passports and licenses, as well as better linking of such extracted PII to individuals in the Person Cardsยฎ report.

  • ๐ŸŽฏ Improved detection accuracy on SSNs.

  • ๐ŸŽฏ Improved accuracy on non-US phone numbers.

  • ๐ŸŽฏ Improved accuracy on passport numbers.

  • ๐ŸŽฏ Improved accuracy on home addresses.

  • ๐ŸŽฏ Improved accuracy on bank account numbers, credit cards numbers and cheque scans.

  • โœ… Added support for scanning Parquet files.

    Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It is used by data analysts and Parquet files often contain PII. PII Tools will now nativaly scan Parquet files of any size.

  • โšก Reduce RAM footprint in scans with many uploaded root folders.

    Previously, whenever users submitted a large text file with concrete files to scan via the "Upload from root folders" button, the scan initialization would consume a lot of RAM. This could lead to out-of-memory errors on servers with little spare RAM.

    In v5.4.0, this niche "Upload from root folders" use case was optimized and users can now upload arbitrarily large root folder lists withour risk of OOM.

  • ๐Ÿ”ด Fixed UI bug where browsing through the pages in the Remediations tab would sometimes not display results past the first page.

  • ๐Ÿ”ด Fixed bug where import of PII Tools state (a previously exported json state file) would sometimes fail with an error.

  • ๐Ÿ”ด Fixed sniffing of file type from extension-less XLSX documents inside nested file archives.

v5.3.4, 19 October 2025

Release v5.3.4 improves UI responsiveness and PII accuracy.
  • โšก Speed up UI during Analytics searches.

    V5.3.4 optimized DB queries leading to smoother user experience in Analytics. Common queries over a single scan now return results up to 8x faster.

  • โœ… New PII type expiry_date and gender extracted from IDs.

    Look for a new expiry_date detector under Personal category. Expiry dates are now automatically extracted out of ID scans, including from images and PDFs. This makes it easier to determine whether a leaked passport / ID is still active or has already expired.

  • ๐ŸŽฏ Improved PII detection and linking in ID scans.

    This includes more accurate PII detections from images and PDFs, including linking them to the right person in the Person Cardsยฎ report.

  • โœ… Apply exclusions immediately during scanning and re-classification.

    Previously, PII Tools always detected and stored all PII, while not displaying excluded PII in the dashboard and in generated reports. To actually remove the excluded PII from PII Tools permanently, users had to click the "Apply permanently" button in their Exclusions tab.

    In v5.3.4, this behaviour changes to be more convenient for users as follows:

    • In new scans, exclusions are applied immediately during scanning. This means the detected PII is not even stored if it matches any active exclusion rule.
    • In existing (old) scans, the "Reclassify" operation now assigns severity based on only non-excluded PII, ignoring all PII that matches any exclusion rile. Use the "Reclassify" button to apply new exclusions to existing scans.
  • ๐ŸŽฏ Improved accuracy of PII extracted from OCR (scanned documents).

  • ๐ŸŽฏ Improved PII detection accuracy from XML files.

  • ๐ŸŽฏ Try harder to process invalid email attachments.

    In particular, PII Tools previously failed to open and scan email attachments that declared the wrong content type in their email metadata (such as text/plain for a PDF attachment).

    PII Tools now ignores the supplied metadata, and makes its own content type determination based on the attachment's actual byte content.

  • ๐ŸŽฏ Improve PII detection accuracy from Word documents.

  • ๐ŸŽฏ Update the bundled NIST NSRL dataset to the latest version 2025.03.1.

  • ๐ŸŽฏ Improve precision and recall of all detectors, in particular "name", "address", "bank_account", "ssn" and "email".

  • ๐Ÿ”ด Fixed RBAC bug where the Superadmin user was not able to remediate files from other users' scans.

  • ๐Ÿ”ด Fixed bug where text from PDF annotations sometimes failed to extract for atypical PDFs.

  • ๐Ÿ”ด Several minor fixes, improving PII Tools stability and robustness.

v5.3.3, 2 October 2025

Release v5.3.3 comes with improved PII accuracy and security.
  • ๐ŸŽฏ Extract street, city, country and ZIP code (postal code) out of addresses.

    PII Tools has always detected global home addresses like 2201 C Street NW I Washington, DC 20520, but some customers reported wanting to see the address components extracted out individually: street as 2201 C Street NW I, city as Washington, state as DC and postal code (in the US: "ZIP code") as 20520. The goal is to simplify legal review and routing of generated PII Person Cardsยฎ reports, based on people's residences.

    v5.3.3 implements this functionality โ€“ each address may now produce several (overlapping) PII subcomponents.

  • โœ… New PII type postcode.

    Related to the above: there is now a new PII type postcode under the Personal PII category, to store extracted ZIP codes. You can look for specific ZIP codes in Analytics, the ZIP codes appear in reports (incl. the Person Cardsยฎ report), just like any other PII type.

  • ๐Ÿ”’ Hardened web server security.

    As an extremely security- and privacy- conscious application, PII Tools has had exactly zero incidents and breaches over the years. To keep it that way, we follow industry best practices and new security trends. v5.3.3 adds Content Security Policy (CSP), Permissions Policy and Strict-Transport-Security headers to its internal web server.

    These server changes should be completely transparent to normal users; if you see any changes in how PII Tools loads and works in your browser, please let us know.

  • ๐ŸŽฏ Detect Medicaid ID numbers.

    PII Tools not detects Medicaid numbers, in addition to Medicare numbers (two different health programs in the US). Detected Medicaid IDs appear under the existing Health ID PII type, under Medical.

  • ๐ŸŽฏ Improve precision and recall of "address" detector.

  • ๐ŸŽฏ Improve recall of "name" detector, by detecting more names in all-lowercase.

  • ๐ŸŽฏ Improve recall of "sexual preferences" detector.

  • ๐ŸŽฏ Improve precision of "date of birth" detector.

  • ๐ŸŽฏ Improve precision of PII extracted from MRZ (machine-readable-zone) of scanned IDs and passports.

  • โœ… Improve Person Cardsยฎ linking in complex emails and email screenshots (images of emails).

  • ๐Ÿ”ด Fixed bug where generated Excel reports sometimes opened with an error.

    What happened is that detected PII that started with the = character, when output into an Excel sheet, caused Excel to interpret that cell as a formula and therefore showed an Excel open error.

    PII starting with = was a rare occurrence to begin with, but v5.3.3 fixes this completely by instructing Excel not to treat such cells as formulas.

  • ๐Ÿ”ด Several minor fixes to PII Tools documentation and UI.

v5.3.2, 12 September 2025, NEW DEVICE AGENT

Release v5.3.2 includes bug fixes and new features for user convenience.
  • ๐ŸŽฏ Improved PII detection in PDF forms.

    This includes PDF files with annotations and mixed-format PDFs, commonly used in PDF forms and templates.

  • โœ… Added "NOT" operator to Analytics.

    Analytics queries in v5.3.2 now allows filters like Owner NOT CONTAINS someone@outlook.com or Person name NOT EQUALS John Smith. This is in addition to existing AND and OR queries, allowing more complex review workflows and data discovery.

    To make querying even simpler, you can list multiple owners in a single NOT CONTAINS> block, using a comma-delimited list: Owner NOT CONTAINS someone@outlook.com,someone_else@outlook.com,me@acme.org. This allows you to quickly select all files or emails that are not owned by the listed users, i.e. emails by everyone else.

  • โšก Process remediation tasks concurrently.

    Previously, when a user launched multiple remediation tasks (Erase files, Redact files, etc) at once, PII Tools processed those tasks in oldest-to-newest order, one after another, sequentially.

    In v5.3.2, all unfinished remediation tasks are processed in round-robin fashion, in parallel.

  • โœ… Quarantine of emails from Exchange Online now creates files with the .eml extension, rather than .txt.

  • โœ… Auto-fix invalid Content-Type in emails.

    In emails, whenever the Content-Type encoding provided for an email attachment is invalid, guess the correct Content-Type from the attachment's filename and content.

    Real-world emails are as messy as real-world PDFs, and often contain invalid or misleading metadata. This change provides additional robustness against malformed emails.

  • โœ… Clearer error message for Office lock files.

    When users edit a Microsoft Office file, Office will automatically create a temporary "lock file", named ~$original_finame.docx. Despite its .docx (or .xlsx) extension, this is not really a Word file and scanning such lock files used to fail with Bad zip file.

    v5.3.2 changes this error message to This is a temporary lock file, not an Office document for clarity. This is the message you will see in your Audit logs going forward.

  • โœ… New OVA image for VMware.

    Customers who run PII Tools on VMware are encouraged to reinstall from the new OVA image published at https://support.pii-tools.com/vmware/.

    Existing installations built from the previous OVA continue to work. This new OVA is an optional upgrade for customers who wish to update the underlying OVA's operating system to the latest Ubuntu 24.04.3 LTS, to include security patches and dependency updates.

  • ๐Ÿ”ด Fixed bug where email attachments from Exchange Online sometimes failed to download after clicking the "Download original" UI button.

  • ๐Ÿ”ด Fixed cleanup_email in POST /stream_scan API.

    EML files submitted to Stream Scan did not sometimes apply the cleanup_email API parameter correctly, leading to an EML parsing error.

    This bug affected only Stream Scans, that is, EML files scanned via Quick Scan and its API equivalent POST /stream_scan. "Normal" batch scans, where folders and devices are scanned in bulk, were not affected.

  • ๐Ÿ”ด Fix scanning files with a long Windows path.

    v5.3.2 fixes a long-standing bug where long Windows paths (โ‰ฅ260 path characters, typically deeply nested paths) did not scan properly due to a Windows filesystem size limitation.

    Such long paths now scan normally.

    This fix is optional and existing deployed agents continue to work. If your Windows systems do not contain such long paths, you can ignore this fix. To apply this fix, upgrade your deployed Windows device agents using the new Windows MSI installer bundled inside v5.3.2.

v5.3.1, 12 August 2025

Release v5.3.1 improves scanning speed and accuracy.
  • โšก Faster scanning.

    In v5.3.1, we optimized all internal routines that access disk. Customers who deployed PII Tools onto a server with a slow disk (limited IOPS, limited throughput, shared disksโ€ฆ) can expect over 2x improvement in their scanning speed. Customers who already used a fast local SSD disk will see a more modest 20-50% speed improvement.

  • โšก Faster stream scanning.

    Workflows that use the PII Tools REST API to scan individual files in real time ("stream scanning", "Quick scan") will also see faster scan results in v5.3.1. The speedup is 5-20% โ€“ more for larger files, such as complex PDFs and large images.

  • โœ… Allow partial scanning of SAS7BDAT files.

    Incomplete truncated SAS7BDAT files are now scanned all the way to the point of file truncation. Truncation typically happens when users specify the "Download at most N bytes" option when launching their scan, to only download and scan a smaller prefix of large files.

    With the SAS7BDAT format (binary data files for SAS analytics), scans of such partially downloaded files are possible and now supported by PII Tools.

  • ๐ŸŽฏ Improved accuracy of PII detection in tables (structured data).

    Including improvements to attributing detected PII to individuals, in the Person Cardsยฎ report.

  • ๐ŸŽฏ Improved recall of first name & last name detectors.

  • ๐ŸŽฏ Improved precision of the phone number detector.

  • ๐Ÿ”ด Fixed bug where edited Scan Schedules could not be saved in the UI.

v5.3.0, 6 August 2025

Release v5.3.0 improves existing workflows and user convenience.
  • โšก Optimize PII Tools start up times.

    After a reboot or an upgrade, the PII Tools container could take upward of 30 minutes in some installations, before becoming fully available. In v5.3.0, we optimized data migrations and start up checks, bringing the boot time back to under a few minutes.

  • โœ… Scan SAS files.

    SAS is a plaintext file format used to store SAS code and programs. PII Tools will now recognize this file format and scan its contents.

  • โœ… Add "Date created" field into the Excel Simple report.

    The Excel Report now displays both "Last modified" and "Date created" for each exported file, enabling a simpler workflow for customers who needed this information.

  • โœ… Allow excluding individual users in Microsoft OneDrive scans.

    Previously, PII Tools users could enumerate which OneDrive users to scan (or say "scan all users"), but could not say which users to not scan.

    In v5.3.0, this workflow is now enabled using -user1, user2, user3โ€ฆ in Root folders. This is the same familiar syntax used to exclude users from Exchange Online scans.

  • โœ… Efficient email filtering by "scan only if before" + "scan only if after" in Exchange Online.

    Exchange Online scans are now significantly faster during Delta Scans. Only emails that fall within the selected date window are actually considered for scanning, leading to quicker and more compact Exchange scans.

  • โšก Automatically split files with large number of PII into several smaller sub-files in the inventory.

    When storing scan results for enormous CSV files, with potentially hundreds of millions of PII instances inside a single scanned file, PII Tools could run out of memory or trigger a database limit error.

    In v5.3.0, PII Tools will automatically spill outsized PII into several separate inventory objects, so that it can process even such large CSV files safely and completely and the result sets and RAM use remain manageable.

    To see scan results across an auto-split file, or to export a report across all its auto-split file parts, simply filter by Filename or path CONTAINS C:\some\path\superlarge.csv in Analytics. The individual sub-file locations will end in //partN, such as C:\some\path\superlarge.csv//part0, C:\some\path\superlarge.csv//part1, C:\some\path\superlarge.csv//part2 etc.

    The examples above talk about auto-splitting CSVs because this is the most common file format to produce such huge PII sets within a single file, due to its unlimited file size. But other file formats are treated exactly the same, including .xlsx, .sas7bdat, and will be automatically split on reaching ~100,000 PII per part.

  • ๐ŸŽฏ Improved accuracy of Person Cardsยฎ reports when linking PII from tabular data.

  • โœ… Matching scan names is now case-insensitive in Analytics.

    Filtering by scan name hr will now produce the same set of results as by HR.

  • ๐Ÿ”ด More robust PDF parsing.

  • ๐Ÿ”ด Fixed a bug where PII Tools failed to import state at boot time via the IMPORT_STATE environment variable.

  • ๐Ÿ”ด Fix RBAC in Analytics where users with the permission "Read scans (of any user)" could still not see scans of other users.

  • ๐Ÿ”ด Several minor UI and stability fixes.

v5.2.0, 16 July 2025

This v5.2.0 release comes packed with new features and improvements:
  • ๐Ÿ”’ Role-based Access Control (RBAC)

    All existing customers get a new module enabled: RBAC. PII Tools admins are now able to create new users, with a fine-grained permission system for "who-has-access-to-what".

    For example, an employee may be assigned the "User" role where they can manage their own scans, but not see or interact with scans of other users.

    The RBAC system covers, as separate configurable permissions:

    • Launching, duplicating, resuming and deleting scans

    • Accessing the scan results and exporting reports

    • Exclusions

    • Custom detectors

    • Custom classifiers

    • Remediations

    • Users

    • Roles

    Each user may be separately configured to have any combination of Create, Read, Update and Delete (CRUD) permissions to the above resources.

    As part of RBAC, user actions are tracked in an immutable "User Activity Log". Admins may download this Activity Log from their dashboard for an audit trail of past user activity.

    RBAC is a new feature of PII Tools. We release RBAC as beta in this v5.2.0 release to collect feedback. If your use-case calls for users and user roles, and have feedback or suggestions, we want to hear from you!

  • โœ… Allow pausing ongoing remediations.

  • โœ… Include "original file owner" in the remediation audit log.

  • โœ… Automatically detect files (binary columns) in SQL databases and scan them as files.

    Relevant to customers who keep files (PDFs, Word documents, Excelโ€ฆ) in their databases.

    No user action is needed during a SQL scan setup โ€“ PII Tools will now recognize such binary file blobs, auto-sniff their document format and scan them as files.

  • โœ… Track and show aggregate "All-time inventory statistics" in the UI.

    By popular demand, PII Tools now tracks the overall number of documents scanned, PII found and users (unique file owners) processed, across your whole PII Tools inventory. Even after a scan is deleted, it still contributes to these "aggregate statistics".

    To view the aggregate statistics of your PII Tools installation, click the "All-time stats" button on top of the inventory statistics window (the little square button to the left of your Analytics search bar).

  • โœ… Automatically guess the primary key column in SQL views and other key-less SQL tables.

    The motivation for this new feature is scanning SQL views, where in some databases such as Microsoft SQL Server, views do not have primary keys.

    Previously, PII Tools reported only the ordinal row number for such key-less database rows. With v5.2.0, PII Tools will look for primary key automatically based on the table schema, and report this automatically determined primary key, for surgical PII redactions and reporting.

  • โœ… Allow selecting Redaction profile when running "Download redacted" remediation

    Previously, "Download redacted" would always redact using the default profile, which equals "mask all PII in full". Starting v5.2.0 users are able to explicitly select the profile they wish to use for the redaction, such as "leave last four CC digits in cleartext; for all other PII types mask everything in full".

  • ๐Ÿ‡ช๐Ÿ‡ธ๐Ÿ‡ธ๐Ÿ‡ฆ Added Spanish and Arabic to available UI translations

    All menus and texts in the PII Tools dashboard are now available in those languages too, for our international customers in the LATAM and Middle East regions.

    Look for the new language icons in the top-right corner of your dashboard, underneath the existing English and Portuguese flags, to change your interface language.

  • ๐ŸŽฏ More accurate partial redactions.

    Partial redactions are redactions where parts of the original PII are to be left visible, in clear text. For example, "Leave the first/last four characters of SSN and Credit cards unredacted".

    This improves on Redaction profiles from the previous release, making redactions more flexible for specific customer workflows.

  • ๐ŸŽฏ Keep all original email headers in redacted emails (EML, MSG).

    Previously, PII Tools only kept select whitelisted headers during email redaction, such as "Subject:", "To:", "From:" etc. With v5.2.0, we changed the email redaction process to output all headers (possibly redacted) into the redacted email output.

  • ๐ŸŽฏ Include the file owner into raw.csv in Person Cardsยฎ report.

    The Person Cardsยฎ is one of the most successful new additions to PII Tools from last year. As part of making Person Cardsยฎ applicable to specific customer workflows, PII Tools now includes a new "Owner" column in its detailed raw.csv report. This "Owner" marks the owner of the file where the linked PII was found, for easier cross-checking during human QC and reviews.

  • ๐Ÿ“– Improve documentation for the Google Drive (GDrive) OAuth flow

  • ๐Ÿ› ๏ธ Include statistics of slave nodes in UI status.

    This improvement is relevant to cluster installations of PII Tools. Clicking the "Copy information to clipboard" button in the โ“˜ status window now includes detailed statistics on each connected slave node, for easier maintenance and cluster inspection.

  • โšก Speed up deleting scans.

    An internal database change that allows about 3x faster deletion of scans from the PII Tools inventory. The improvement only applies to scans created after v5.2.0, but is fully backward compatible (existing pre-v5.2.0 scans continue to work transparently).

  • โšก Limit the number of analyzed table columns: scan only the first 100 columns.

    To avoid spending excessive resources on malformed documents and tables, PII Tools now scans only the first 100 columns in tables.

    Processing of table rows is unaffected by this change: still controlled by the How many rows to scan? configurable parameter as before.

  • ๐ŸŽฏ Improve precision of the name detector

  • ๐ŸŽฏ Improve recall on SSN detector

  • ๐ŸŽฏ Improve precision of routing numbers detector

  • ๐ŸŽฏ Improve recall of password detector

  • ๐ŸŽฏ Improve precision of health detector

  • ๐ŸŽฏ Improve precision of WHO ICD detector

  • ๐ŸŽฏ Improve precision and recall of the phone number detector

  • ๐ŸŽฏ Improve precision of the username detector in tables (structured data)

  • ๐ŸŽฏ Improve accuracy of the credit card detector

  • ๐ŸŽฏ Improve accuracy of the driving license detector

  • ๐Ÿ”ด Fixed a bug where a Microsoft connector error while listing Sharepoint sites caused the whole scan to stop, skipping all remaining sites.

  • ๐Ÿ”ด Several fixes and improvements to remediation. The document remediation process is now more robust in the face of bad inputs and edge cases.

  • ๐Ÿ”ด Fixed Analytics filtering by "Created date" in Sharepoint and OneDrive scans.

  • ๐Ÿ”ด Fixed a problem where internal temporary directory inside the PII Tools container could grow in size.

  • ๐Ÿ”ด A number of other minor UI and stability fixes.

v5.1.1, 22 March 2025

Important bug fixes and internal upgrades in this PII Tools v5.1.1 release:
  • ๐Ÿ› ๏ธโšก Internal database upgrade, for faster storage scanning and faster UI analytics.

    Please make sure you have >50% free disk space on your PII Tools server before initiating the upgrade.
    If your server has less than 50% free disk space, please contact support@pii-tools.com for assistance with this upgrade.

  • โœ… Extend health detectors (PHI) with common MRNs.

  • โœ… Improved remediation tab with additional information.

  • โœ… Allow scanning only selected users in GDrive scans.

    Previously, Google Drive scans allowed scanning either all users, or a single selected user, or a single selected drive folder.

    Starting with v5.1.1, there is now a new option "Scan drives of selected users?" when scanning GDrive with a service account. This option lets PII Tools users upload a list of primary emails of the GDrive users whose documents to scan.

    This list of primary emails may contain the wildcard pattern *, to match multiple GDrive users. For example, *@mydomain.com will match any primary email within the mydomain.com Google domain.

  • โœ… New config option: SCAN_WORKER_MAX_RAM.

    By default, PII Tools allows at most 2GB of RAM to be allocated inside a scan worker during scanning of any one file. Exceeding this peak 2GB quota will lead to the document being marked as FAILED.

    This 2GB limit is typically plenty, but for specific workloads, customers with enough RAM are now able to override this limit inside their docker-compose.yml config file. For example: set - SCAN_WORKER_MAX_RAM=3 for a max-3GB-per-scan-worker RAM limit.

  • ๐ŸŽฏ New regional ID detectors (under Nationalโ†’SSN) for the Gulf countries: Saudi Arabia, Emirates, Bahrain, Qatar.

  • ๐ŸŽฏ Improved recall of phone numbers detector.

  • ๐ŸŽฏ Improved recall of names detector.

  • ๐ŸŽฏ Improved precision of credit cards detector.

  • ๐ŸŽฏ Improved precision of passwords detector.

  • โšก Improved performance of Google Drive scanning and redaction.

  • ๐Ÿ”ด Improved parsing of broken / malformed Word documents.

  • ๐Ÿ”ด Improved parsing of Excel sheets with invalid dates.

  • ๐Ÿ”ด Improved robustness of redaction of CSV, EML and MSG files.

  • ๐Ÿ”ด A number of minor UI and stability fixes.

v5.1.0, 17 February 2025

PII Tools v5.1.0 introduces new redaction options. It also brings improved scanning speed and PII accuracy.
  • โœ… Redaction profiles and partial PII redactions

    Previously, whenever redacting PII within a document, PII Tools would mask the whole detected PII. For example, it would black out the entire name or SSN number inside PDFs and images, or replace that PII by XXXXX in Excel, Word, CSVs, emails and plain text files.

    In v5.1.0 we made redaction more flexible. Users are now able to refine redactions by leaving some parts of the PII unredacted (e.g. "when redacting SSNs, leave the last four SSN digits in the clear").

    We are working on extending redactions further, allowing PII pseudonymization ("replace John Smith by Adam Wirth") and tokenization ("replace John Smith by an opaque token, which can later be turned back into John Smith using a secret key").

    To tame all this flexibility in a user-friendly workflow, v5.1.0 introduces "Redaction profiles" โ€“ a new tab in the main UI menu. Redaction profiles let users define rules for what-transformation-should-happen-to-what-PII-type, then store those rules persistently into a Redaction Profile, to then recall that profile whenever launching a redaction operation.

    For more information please review https://documentation.pii-tools.com/#redaction-profiles.

  • โœ… Display additional details inside the Remediations tab, to make navigating Remediations tasks and progress easier.

  • ๐ŸŽฏ Improved PII recall for Word documents.

    Specifically, we improved parsing of more exotic documents with nested tables, control fields and text boxes.

  • ๐ŸŽฏ Improved PII recall from complex Excel tables.

  • ๐ŸŽฏ Improved recall of driving license image detector.

  • ๐ŸŽฏ Improved precision of credit card detector.

  • ๐ŸŽฏ Improved precision of SSN detector.

  • โšก Optimize processing of macro-enabled Word documents.

v5.0.2, 2 February 2025

PII Tools v5.0.2 optimizes common remediation and detection workflows.
  • โœ… Allow erasing whole archives.

    The menu for Secure Erase now contains a new option for "What to do about files in archives?": "Erase the whole archive".

  • โœ… Add option to scan Salesforce sandbox environments.

    Previously, customers could scan production Salesforce environments. To support internal deployment processes, PII Tools now also allows scans over test (sandbox) Salesforce environments.

    PII Tools documentation has been updated to clarify the necessary Salesforce authentication steps: https://documentation.pii-tools.com/#salesforce

  • โœ… Improve in-place redaction of Excel sheets.

  • ๐ŸŽฏ Improve PII detection in tabular formats (Excel, CSV, tables in PDFs, tables in imagesโ€ฆ) that do not really contain tables.

    Some documents may contain semi-structured data formatted visually into a table, where the information is nevertheless not really tabular: the data is not rows, nor really columns.

    PII Tools now does a better job understanding such jumbled or rotated tables, and extracting PII from them correctly.

  • ๐ŸŽฏ Improve precision of detecting credit cards in tables.

  • ๐ŸŽฏ Implement Singapore NRIC (national id) detector under the National-SSN category.

  • ๐ŸŽฏ Improve accuracy of the SSN, home address, person name, IP, and health detectors.

  • โšก Optimize processing of Office documents (Word, PowerPoint, โ€ฆ)

v5.0.1, 25 December 2024

PII Tools v5.0.1 builds on the last major 5.0.0 release to further improve remediations and detection accuracy.
  • โœ… Allow remediating the same objects again.

    A common feature request we got with remediation was to go and run another remediation action over the same files again. Previously, this was not possible in PII Tools: once you remediated the files โ€“ such as redacted or deleted them โ€“ these objects disappeared from the PII Tools inventory, so you could not access them or remediate them again. This was a problem for files that FAILED to remediate, for example due to a permission or access error. The only solution was to re-scan the affected files to bring them back into the PII Tools inventory, so they could be remediated again. While workable, this was a tedious, user-unfriendly process.

    In v5.0.1, you can re-remediate files even when they are not in the inventory any more. Simply submit a list of files to remediate using the "Remediate from locations" button in your Remediations UI tab. There is no need to rescan such files first.

  • ๐ŸŽฏ Improve OCR, especially from PDF forms.

  • ๐ŸŽฏ Improve precision of the Financial cheques detector.

  • โšก Improve speed of the remediation process.

  • ๐Ÿ”ด Fix bug where Office Word documents that contained the "CURRENT DATE" dynamic placeholder could produce different PII positions depending on when the scan happened, leading to redaction mismatch.

    For example, "CURRENT_DATE John Smith" would detect "John Smith" with one PII position offset when scanned on "10 January 2024", and another PII offset when scanned on "1 March 2024", due to different lengths of the created date.

v5.0.0, 10 December 2024

PII Tools v5.0.0 revolves around new redaction features and optimizations. As the 5.0 version tag indicates, this is a major product rewrite, resulting in increased performance and stability:
  • โœ… Redact Office documents.

    PII Tools will now redact native Word (.doc, .docx, etc) and Excel (.xls, .xlsx, .xlsb, .xlsm, โ€ฆ) documents, including in-place redaction. The resulting file is exactly like the original, except all (or all selected) PII is redacted out.

    This means that customers who purchased the Remediation module are able to surgically redact all major file types now, from PDF to Word and Excel, to CSV, text and emails.

  • โœ… Redact emails.

    Similar to the above, PII Tools is now able to redact MSG and EML emails, including attachments. The output EML email will be re-assembled from the redacted original email headers, email body and attachments.

  • โœ… Added a new report type: "Duplicates".

    This report lists all file duplicates in your inventory, in an easy-to-process CSV format.

    The duplicates are determined based on their file content, regardless of filenames and other metadata, so that duplicates are captured reliably across all scanned storages (device, S3, email attachments, OneDriveโ€ฆ).

  • โœ… Allow uploading a list of files to scan (or skip) in "Accept filenames" and "Reject filenames".

    This feature was requested by customers who have a list of documents (filenames) to scan, but they don't know the exact location.

    You can now collect a list of all such filenames or partial file paths into a text file, with one filename per line, and then upload that text file into "What file to scan?" while configuring your scan.

    The "Accept filenames" text file may be arbitrarily large, containing tens or hundreds of thousands of filenames inside.

    This functionality is similar to the existing "Upload Root folder from file" button. But unlike "Root folders" which accepts exact full paths such as C:\folder\subFolder\my_file.pdf, this "Accept filenames" solution is more flexible because it allows partial location matches such as my_file.pdf or subFolder/my_file.pdf, without needing to specify the full absolute path.

  • ๐ŸŽฏ Improve accuracy of SSN detector.

  • ๐ŸŽฏ Improve parsing of MSG files.

  • ๐ŸŽฏ Optimize parsing of larger XLSB (binary Excel) spreadsheets.

  • โšก Improve speed and robustness of the remediation process.

  • โšก Improve handling of "mailbox concurrency limits" in Microsoft Graph API (affects Exchange Online scans).

  • โšก Optimized the "Forget" remediation action.

  • ๐Ÿ”ด Extend the orange "scan status" badge to scans with (at least one) failed incomplete archive.

    Previously, an archive that errored out during scanning could lead to a "green" icon for the whole scan. The new "orange" icon reflects more clearly the status of the scan, i.e. some files could not even be accessed by PII Tools, and thus the total SCANNED/SKIPPED/FAILED numbers may be incomplete.

    As usual, check the FAILED items in your scan's Audit log to see what failed exactly and why.

  • ๐Ÿ”ด Several smaller fixes to UI and server.