v5.5.0, 23 March 2026
This new remediation feature lets you combine the power and flexibility of PII Tools' scanning and investigative workflows with Microsoft's complex world of data governance and DLP.
-
โ Label documents in Microsoft Purview
If your organization has an active Purview license and you want to label sensitive documents inside your Microsoft tenancy, get in touch with our PII Tools support to help you set up this new integration feature.
-
๐ฏ Improved accuracy of the ID and passport detector.
-
๐ฏ Improved accuracy of name detector.
-
๐ฏ Improved accuracy of home address detector.
-
๐ฏ Improved accuracy of SSN and tax ID detectors.
-
๐ฏ Improved accuracy of routing number and bank account detectors.
-
๐ Regular security update of internal dependencies.
-
๐ด Fix parsing error on PDFs that contained PDF annotations but no PDF text.
-
๐ด Fix parsing error on emails with only BOM character as their complete email body.
-
โก Several minor UI fixes, updates and improvements.
v5.4.3, 10 February 2026
-
๐ฏ๐ด Improved accuracy (reduced both false positives and false negatives) of ID scans, Passport scans, Residency Cards and scanned Driver Licenses.
This is a critical fix for a regression introduced in v5.4.1 that caused multi-page PDFs to sometimes produce incorrect ID detections and miss actual IDs.
-
โ Deduplicate email addresses entered into the Root Folder of Exchange scans.
Previously, a Root folder value of
john@acme.com,john@acme.comwould scan John's mailbox twice. Starting with v5.4.3, PII Tools deduplicates such comma-separated Root folder values. This new behaviour is in line with how root folders are already being deduplicated in other connectors and in Root Folders uploaded from a file. -
โ Extract the "Last modified" date from emails extracted from PST archives, allowing filtering of such emails by date in Analytics.
-
โก Optimize size of DB activity logs, for a smaller installation footprint.
-
๐ด Fixed boot-up error when the
LOCK_OUT_AFTER_FAILED_LOGIN_ATTEMPTSserver configuration option was set. -
๐ด Fixed bug that caused scans that were launched by uploading hundreds of Root folders from a file to fail.
-
๐ด Fixed bug where remediations that were actively running when PII Tools server was restarted could become "stuck".
Beyond fixing the original bug, v5.4.3 will also automatically check for such "unfinished" remediations on startup and resume them. No user action is required after the upgrade.
-
๐ด Several smaller improvements to scanning accuracy, robustness and the web UI.
v5.4.1 = v5.4.2, 22 January 2026
There is also the usual round of improvements to PII accuracy, UI workflow and bug fixes.
-
โ Redesign Drill-down and Risk Summary reports.
The popular Drill-down and Risk Summary reports underwent a major facelift. These reports are now easier on the eye while retaining their packed information content.
-
โก Speed up remediations.
v5.4.1 optimized the performance of document remediation. This means faster processing of documents during Secure Erase and Redact, at a cost of slightly increased server RAM footprint.
The speedup is 3x, but if you rely on redacting or erasing large numbers of documents, reach out to support@pii-tools.com to optimize this 3x factor further.
-
โ Reduce size of redacted PDFs.
PII Tools will now automatically detect documents that are primarily black-and-white after redaction, and automatically compress the redacted output using JBIG2. This leads to 2-50x smaller PDF output. This space saving is critical in workflows that rely on redacting emails with PDF attachments, and other PDF-heavy repositories.
-
โ Scan HEIC image format.
.heicis a popular photo format, especially in the Apple ecosystem. -
โ Don't scan O365 system drives and libraries.
Previously, PII Tools would scan through the storage when scanning OneDrive, including hidden (system) drives and libraries.
In v5.4.1, PII Tools automatically detects and ignores such system drives and libraries.
-
๐ฏ Automatically detect MSG files that are in fact EML.
Two popular formats to store emails (incl. attachments) as files are MSG and EML. Some 3rd party tools will store emails in the EML format while giving the file an incorrect
.msgextension.PII Tools v5.4.1 now automatically detects such mis-named email files and scans them appropriately.
-
๐ฏ Improved accuracy of SSNs and tax IDs.
-
๐ฏ Improved accuracy of phone number detector.
-
๐ฏ Improved accuracy of medical MBI detector.
-
๐ด Fixed bug where MSG archives could not be deleted from OneDrive during Secure Erase.
-
๐ด Fixed bug with linking nested scan IDs in Person Cardsยฎ report.
-
๐ด Clearer error message for files that FAILED due to Microsoft's Information Rights Management (IRM) and Azure Information Protection (AIP).
-
๐ด Fixed sniffing of file type from extension-less XLSX documents inside nested file archives.
-
๐ด Removed a few rarely-used parameters from the "Launch Scan" web form. The UI form is now more streamlined, while these removed parameters are still available through the API for advanced users.
v5.4.0, 7 December 2025
-
โ Support redacting PII within HTML, including inside "rich text bodies" of emails.
This feature enables new workflows for clients who scan and redact emails.
PII Tools previously redacted emails by redacting the plaintext email body and attachments. This means the redacted email showed "without formatting" in email clients, as plaintext only.
v5.4.0 implements full HTML redaction, which means emails are now redacted completely โ PII redacted inside the rich-text email body, inside the plain text email body, headers and attachments. The outcome is a surgically redacted email with the same structure as the original email, natively viewable in Outlook and other email clients.
-
โ Support automated deployment of MacOS agents via JamfPro.
To scan fleets of MacOS devices, PII Tools now comes with JamfPro scripts to automate mass-deployment. The scripts support both MacOS running on the older Intel architecture (x86_64) as well as MacOS running on the newer "Apple Silicon" ARM architecture (M1, M2, M3, M4 etc).
-
โ New PII type "Residency card" (under
NationalโScan ID). -
๐ฏ Improved PII detection inside scanned IDs (images).
This includes more accurate extraction of PII from various ID scans, passports and licenses, as well as better linking of such extracted PII to individuals in the Person Cardsยฎ report.
-
๐ฏ Improved detection accuracy on SSNs.
-
๐ฏ Improved accuracy on non-US phone numbers.
-
๐ฏ Improved accuracy on passport numbers.
-
๐ฏ Improved accuracy on home addresses.
-
๐ฏ Improved accuracy on bank account numbers, credit cards numbers and cheque scans.
-
โ Added support for scanning Parquet files.
Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It is used by data analysts and Parquet files often contain PII. PII Tools will now nativaly scan Parquet files of any size.
-
โก Reduce RAM footprint in scans with many uploaded root folders.
Previously, whenever users submitted a large text file with concrete files to scan via the "Upload from root folders" button, the scan initialization would consume a lot of RAM. This could lead to out-of-memory errors on servers with little spare RAM.
In v5.4.0, this niche "Upload from root folders" use case was optimized and users can now upload arbitrarily large root folder lists withour risk of OOM.
-
๐ด Fixed UI bug where browsing through the pages in the Remediations tab would sometimes not display results past the first page.
-
๐ด Fixed bug where import of PII Tools state (a previously exported
jsonstate file) would sometimes fail with an error. -
๐ด Fixed sniffing of file type from extension-less XLSX documents inside nested file archives.
v5.3.4, 19 October 2025
-
โก Speed up UI during Analytics searches.
V5.3.4 optimized DB queries leading to smoother user experience in Analytics. Common queries over a single scan now return results up to 8x faster.
-
โ New PII type
expiry_dateandgenderextracted from IDs.Look for a new
expiry_datedetector under Personal category. Expiry dates are now automatically extracted out of ID scans, including from images and PDFs. This makes it easier to determine whether a leaked passport / ID is still active or has already expired. -
๐ฏ Improved PII detection and linking in ID scans.
This includes more accurate PII detections from images and PDFs, including linking them to the right person in the Person Cardsยฎ report.
-
โ Apply exclusions immediately during scanning and re-classification.
Previously, PII Tools always detected and stored all PII, while not displaying excluded PII in the dashboard and in generated reports. To actually remove the excluded PII from PII Tools permanently, users had to click the "Apply permanently" button in their Exclusions tab.
In v5.3.4, this behaviour changes to be more convenient for users as follows:
- In new scans, exclusions are applied immediately during scanning. This means the detected PII is not even stored if it matches any active exclusion rule.
- In existing (old) scans, the "Reclassify" operation now assigns severity based on only non-excluded PII, ignoring all PII that matches any exclusion rile. Use the "Reclassify" button to apply new exclusions to existing scans.
-
๐ฏ Improved accuracy of PII extracted from OCR (scanned documents).
-
๐ฏ Improved PII detection accuracy from XML files.
-
๐ฏ Try harder to process invalid email attachments.
In particular, PII Tools previously failed to open and scan email attachments that declared the wrong content type in their email metadata (such as
text/plainfor a PDF attachment).PII Tools now ignores the supplied metadata, and makes its own content type determination based on the attachment's actual byte content.
-
๐ฏ Improve PII detection accuracy from Word documents.
-
๐ฏ Update the bundled NIST NSRL dataset to the latest version
2025.03.1. -
๐ฏ Improve precision and recall of all detectors, in particular "name", "address", "bank_account", "ssn" and "email".
-
๐ด Fixed RBAC bug where the Superadmin user was not able to remediate files from other users' scans.
-
๐ด Fixed bug where text from PDF annotations sometimes failed to extract for atypical PDFs.
-
๐ด Several minor fixes, improving PII Tools stability and robustness.
v5.3.3, 2 October 2025
-
๐ฏ Extract street, city, country and ZIP code (postal code) out of addresses.
PII Tools has always detected global home addresses like
2201 C Street NW I Washington, DC 20520, but some customers reported wanting to see the address components extracted out individually: street as2201 C Street NW I, city asWashington, state asDCand postal code (in the US: "ZIP code") as20520. The goal is to simplify legal review and routing of generated PII Person Cardsยฎ reports, based on people's residences.v5.3.3 implements this functionality โ each address may now produce several (overlapping) PII subcomponents.
-
โ New PII type
postcode.Related to the above: there is now a new PII type
postcodeunder thePersonalPII category, to store extracted ZIP codes. You can look for specific ZIP codes in Analytics, the ZIP codes appear in reports (incl. the Person Cardsยฎ report), just like any other PII type. -
๐ Hardened web server security.
As an extremely security- and privacy- conscious application, PII Tools has had exactly zero incidents and breaches over the years. To keep it that way, we follow industry best practices and new security trends. v5.3.3 adds Content Security Policy (CSP), Permissions Policy and Strict-Transport-Security headers to its internal web server.
These server changes should be completely transparent to normal users; if you see any changes in how PII Tools loads and works in your browser, please let us know.
-
๐ฏ Detect Medicaid ID numbers.
PII Tools not detects Medicaid numbers, in addition to Medicare numbers (two different health programs in the US). Detected Medicaid IDs appear under the existing
Health IDPII type, underMedical. -
๐ฏ Improve precision and recall of "address" detector.
-
๐ฏ Improve recall of "name" detector, by detecting more names in all-lowercase.
-
๐ฏ Improve recall of "sexual preferences" detector.
-
๐ฏ Improve precision of "date of birth" detector.
-
๐ฏ Improve precision of PII extracted from MRZ (machine-readable-zone) of scanned IDs and passports.
-
โ Improve Person Cardsยฎ linking in complex emails and email screenshots (images of emails).
-
๐ด Fixed bug where generated Excel reports sometimes opened with an error.
What happened is that detected PII that started with the
=character, when output into an Excel sheet, caused Excel to interpret that cell as a formula and therefore showed an Excel open error.PII starting with
=was a rare occurrence to begin with, but v5.3.3 fixes this completely by instructing Excel not to treat such cells as formulas. -
๐ด Several minor fixes to PII Tools documentation and UI.
v5.3.2, 12 September 2025, NEW DEVICE AGENT
-
๐ฏ Improved PII detection in PDF forms.
This includes PDF files with annotations and mixed-format PDFs, commonly used in PDF forms and templates.
-
โ Added "NOT" operator to Analytics.
Analytics queries in v5.3.2 now allows filters like
Owner NOT CONTAINS someone@outlook.comorPerson name NOT EQUALS John Smith. This is in addition to existing AND and OR queries, allowing more complex review workflows and data discovery.To make querying even simpler, you can list multiple owners in a single
NOT CONTAINS>block, using a comma-delimited list:Owner NOT CONTAINS someone@outlook.com,someone_else@outlook.com,me@acme.org. This allows you to quickly select all files or emails that are not owned by the listed users, i.e. emails by everyone else. -
โก Process remediation tasks concurrently.
Previously, when a user launched multiple remediation tasks (Erase files, Redact files, etc) at once, PII Tools processed those tasks in oldest-to-newest order, one after another, sequentially.
In v5.3.2, all unfinished remediation tasks are processed in round-robin fashion, in parallel.
-
โ Quarantine of emails from Exchange Online now creates files with the
.emlextension, rather than.txt. -
โ Auto-fix invalid
Content-Typein emails.In emails, whenever the
Content-Typeencoding provided for an email attachment is invalid, guess the correctContent-Typefrom the attachment's filename and content.Real-world emails are as messy as real-world PDFs, and often contain invalid or misleading metadata. This change provides additional robustness against malformed emails.
-
โ Clearer error message for Office lock files.
When users edit a Microsoft Office file, Office will automatically create a temporary "lock file", named
~$original_finame.docx. Despite its.docx(or.xlsx) extension, this is not really a Word file and scanning such lock files used to fail withBad zip file.v5.3.2 changes this error message to
This is a temporary lock file, not an Office documentfor clarity. This is the message you will see in your Audit logs going forward. -
โ New OVA image for VMware.
Customers who run PII Tools on VMware are encouraged to reinstall from the new OVA image published at https://support.pii-tools.com/vmware/.
Existing installations built from the previous OVA continue to work. This new OVA is an optional upgrade for customers who wish to update the underlying OVA's operating system to the latest Ubuntu 24.04.3 LTS, to include security patches and dependency updates.
-
๐ด Fixed bug where email attachments from Exchange Online sometimes failed to download after clicking the "Download original" UI button.
-
๐ด Fixed
cleanup_emailinPOST /stream_scanAPI.EML files submitted to Stream Scan did not sometimes apply the
cleanup_emailAPI parameter correctly, leading to an EML parsing error.This bug affected only Stream Scans, that is, EML files scanned via Quick Scan and its API equivalent
POST /stream_scan. "Normal" batch scans, where folders and devices are scanned in bulk, were not affected. -
๐ด Fix scanning files with a long Windows path.
v5.3.2 fixes a long-standing bug where long Windows paths (โฅ260 path characters, typically deeply nested paths) did not scan properly due to a Windows filesystem size limitation.
Such long paths now scan normally.
This fix is optional and existing deployed agents continue to work. If your Windows systems do not contain such long paths, you can ignore this fix. To apply this fix, upgrade your deployed Windows device agents using the new Windows MSI installer bundled inside v5.3.2.
v5.3.1, 12 August 2025
-
โก Faster scanning.
In v5.3.1, we optimized all internal routines that access disk. Customers who deployed PII Tools onto a server with a slow disk (limited IOPS, limited throughput, shared disksโฆ) can expect over 2x improvement in their scanning speed. Customers who already used a fast local SSD disk will see a more modest 20-50% speed improvement.
-
โก Faster stream scanning.
Workflows that use the PII Tools REST API to scan individual files in real time ("stream scanning", "Quick scan") will also see faster scan results in v5.3.1. The speedup is 5-20% โ more for larger files, such as complex PDFs and large images.
-
โ Allow partial scanning of SAS7BDAT files.
Incomplete truncated SAS7BDAT files are now scanned all the way to the point of file truncation. Truncation typically happens when users specify the "Download at most N bytes" option when launching their scan, to only download and scan a smaller prefix of large files.
With the SAS7BDAT format (binary data files for SAS analytics), scans of such partially downloaded files are possible and now supported by PII Tools.
-
๐ฏ Improved accuracy of PII detection in tables (structured data).
Including improvements to attributing detected PII to individuals, in the Person Cardsยฎ report.
-
๐ฏ Improved recall of first name & last name detectors.
-
๐ฏ Improved precision of the phone number detector.
-
๐ด Fixed bug where edited Scan Schedules could not be saved in the UI.
v5.3.0, 6 August 2025
-
โก Optimize PII Tools start up times.
After a reboot or an upgrade, the PII Tools container could take upward of 30 minutes in some installations, before becoming fully available. In v5.3.0, we optimized data migrations and start up checks, bringing the boot time back to under a few minutes.
-
โ Scan SAS files.
SAS is a plaintext file format used to store SAS code and programs. PII Tools will now recognize this file format and scan its contents.
-
โ Add "Date created" field into the Excel Simple report.
The Excel Report now displays both "Last modified" and "Date created" for each exported file, enabling a simpler workflow for customers who needed this information.
-
โ Allow excluding individual users in Microsoft OneDrive scans.
Previously, PII Tools users could enumerate which OneDrive users to scan (or say "scan all users"), but could not say which users to not scan.
In v5.3.0, this workflow is now enabled using
-user1, user2, user3โฆin Root folders. This is the same familiar syntax used to exclude users from Exchange Online scans. -
โ Efficient email filtering by "scan only if before" + "scan only if after" in Exchange Online.
Exchange Online scans are now significantly faster during Delta Scans. Only emails that fall within the selected date window are actually considered for scanning, leading to quicker and more compact Exchange scans.
-
โก Automatically split files with large number of PII into several smaller sub-files in the inventory.
When storing scan results for enormous CSV files, with potentially hundreds of millions of PII instances inside a single scanned file, PII Tools could run out of memory or trigger a database limit error.
In v5.3.0, PII Tools will automatically spill outsized PII into several separate inventory objects, so that it can process even such large CSV files safely and completely and the result sets and RAM use remain manageable.
To see scan results across an auto-split file, or to export a report across all its auto-split file parts, simply filter by
Filename or path CONTAINS C:\some\path\superlarge.csvin Analytics. The individual sub-file locations will end in//partN, such asC:\some\path\superlarge.csv//part0,C:\some\path\superlarge.csv//part1,C:\some\path\superlarge.csv//part2etc.The examples above talk about auto-splitting CSVs because this is the most common file format to produce such huge PII sets within a single file, due to its unlimited file size. But other file formats are treated exactly the same, including
.xlsx,.sas7bdat, and will be automatically split on reaching ~100,000 PII per part. -
๐ฏ Improved accuracy of Person Cardsยฎ reports when linking PII from tabular data.
-
โ Matching scan names is now case-insensitive in Analytics.
Filtering by scan name
hrwill now produce the same set of results as byHR. -
๐ด More robust PDF parsing.
-
๐ด Fixed a bug where PII Tools failed to import state at boot time via the
IMPORT_STATEenvironment variable. -
๐ด Fix RBAC in Analytics where users with the permission "Read scans (of any user)" could still not see scans of other users.
-
๐ด Several minor UI and stability fixes.
v5.2.0, 16 July 2025
-
๐ Role-based Access Control (RBAC)
All existing customers get a new module enabled: RBAC. PII Tools admins are now able to create new users, with a fine-grained permission system for "who-has-access-to-what".
For example, an employee may be assigned the "User" role where they can manage their own scans, but not see or interact with scans of other users.
The RBAC system covers, as separate configurable permissions:
Launching, duplicating, resuming and deleting scans
Accessing the scan results and exporting reports
Exclusions
Custom detectors
Custom classifiers
Remediations
Users
Roles
Each user may be separately configured to have any combination of Create, Read, Update and Delete (CRUD) permissions to the above resources.
As part of RBAC, user actions are tracked in an immutable "User Activity Log". Admins may download this Activity Log from their dashboard for an audit trail of past user activity.
RBAC is a new feature of PII Tools. We release RBAC as beta in this v5.2.0 release to collect feedback. If your use-case calls for users and user roles, and have feedback or suggestions, we want to hear from you!
โ Allow pausing ongoing remediations.
โ Include "original file owner" in the remediation audit log.
-
โ Automatically detect files (binary columns) in SQL databases and scan them as files.
Relevant to customers who keep files (PDFs, Word documents, Excelโฆ) in their databases.
No user action is needed during a SQL scan setup โ PII Tools will now recognize such binary file blobs, auto-sniff their document format and scan them as files.
-
โ Track and show aggregate "All-time inventory statistics" in the UI.
By popular demand, PII Tools now tracks the overall number of documents scanned, PII found and users (unique file owners) processed, across your whole PII Tools inventory. Even after a scan is deleted, it still contributes to these "aggregate statistics".
To view the aggregate statistics of your PII Tools installation, click the "All-time stats" button on top of the inventory statistics window (the little square button to the left of your Analytics search bar).
-
โ Automatically guess the primary key column in SQL views and other key-less SQL tables.
The motivation for this new feature is scanning SQL views, where in some databases such as Microsoft SQL Server, views do not have primary keys.
Previously, PII Tools reported only the ordinal row number for such key-less database rows. With v5.2.0, PII Tools will look for primary key automatically based on the table schema, and report this automatically determined primary key, for surgical PII redactions and reporting.
-
โ Allow selecting Redaction profile when running "Download redacted" remediation
Previously, "Download redacted" would always redact using the default profile, which equals "mask all PII in full". Starting v5.2.0 users are able to explicitly select the profile they wish to use for the redaction, such as "leave last four CC digits in cleartext; for all other PII types mask everything in full".
๐ช๐ธ๐ธ๐ฆ Added Spanish and Arabic to available UI translations
All menus and texts in the PII Tools dashboard are now available in those languages too, for our international customers in the LATAM and Middle East regions.
Look for the new language icons in the top-right corner of your dashboard, underneath the existing English and Portuguese flags, to change your interface language.
-
๐ฏ More accurate partial redactions.
Partial redactions are redactions where parts of the original PII are to be left visible, in clear text. For example, "Leave the first/last four characters of SSN and Credit cards unredacted".
This improves on Redaction profiles from the previous release, making redactions more flexible for specific customer workflows.
-
๐ฏ Keep all original email headers in redacted emails (EML, MSG).
Previously, PII Tools only kept select whitelisted headers during email redaction, such as "Subject:", "To:", "From:" etc. With v5.2.0, we changed the email redaction process to output all headers (possibly redacted) into the redacted email output.
-
๐ฏ Include the file owner into raw.csv in Person Cardsยฎ report.
The Person Cardsยฎ is one of the most successful new additions to PII Tools from last year. As part of making Person Cardsยฎ applicable to specific customer workflows, PII Tools now includes a new "Owner" column in its detailed raw.csv report. This "Owner" marks the owner of the file where the linked PII was found, for easier cross-checking during human QC and reviews.
-
๐ Improve documentation for the Google Drive (GDrive) OAuth flow
-
๐ ๏ธ Include statistics of slave nodes in UI status.
This improvement is relevant to cluster installations of PII Tools. Clicking the "Copy information to clipboard" button in the โ status window now includes detailed statistics on each connected slave node, for easier maintenance and cluster inspection.
-
โก Speed up deleting scans.
An internal database change that allows about 3x faster deletion of scans from the PII Tools inventory. The improvement only applies to scans created after v5.2.0, but is fully backward compatible (existing pre-v5.2.0 scans continue to work transparently).
-
โก Limit the number of analyzed table columns: scan only the first 100 columns.
To avoid spending excessive resources on malformed documents and tables, PII Tools now scans only the first 100 columns in tables.
Processing of table rows is unaffected by this change: still controlled by the
How many rows to scan?configurable parameter as before. ๐ฏ Improve precision of the name detector
๐ฏ Improve recall on SSN detector
๐ฏ Improve precision of routing numbers detector
๐ฏ Improve recall of password detector
๐ฏ Improve precision of health detector
๐ฏ Improve precision of WHO ICD detector
๐ฏ Improve precision and recall of the phone number detector
๐ฏ Improve precision of the username detector in tables (structured data)
๐ฏ Improve accuracy of the credit card detector
๐ฏ Improve accuracy of the driving license detector
๐ด Fixed a bug where a Microsoft connector error while listing Sharepoint sites caused the whole scan to stop, skipping all remaining sites.
๐ด Several fixes and improvements to remediation. The document remediation process is now more robust in the face of bad inputs and edge cases.
๐ด Fixed Analytics filtering by "Created date" in Sharepoint and OneDrive scans.
๐ด Fixed a problem where internal temporary directory inside the PII Tools container could grow in size.
๐ด A number of other minor UI and stability fixes.
v5.1.1, 22 March 2025
-
๐ ๏ธโก Internal database upgrade, for faster storage scanning and faster UI analytics.
Please make sure you have >50% free disk space on your PII Tools server before initiating the upgrade.
If your server has less than 50% free disk space, please contact support@pii-tools.com for assistance with this upgrade. -
โ Extend health detectors (PHI) with common MRNs.
-
โ Improved remediation tab with additional information.
-
โ Allow scanning only selected users in GDrive scans.
Previously, Google Drive scans allowed scanning either all users, or a single selected user, or a single selected drive folder.
Starting with v5.1.1, there is now a new option "Scan drives of selected users?" when scanning GDrive with a service account. This option lets PII Tools users upload a list of primary emails of the GDrive users whose documents to scan.
This list of primary emails may contain the wildcard pattern
*, to match multiple GDrive users. For example,*@mydomain.comwill match any primary email within the mydomain.com Google domain. -
โ New config option: SCAN_WORKER_MAX_RAM.
By default, PII Tools allows at most 2GB of RAM to be allocated inside a scan worker during scanning of any one file. Exceeding this peak 2GB quota will lead to the document being marked as FAILED.
This 2GB limit is typically plenty, but for specific workloads, customers with enough RAM are now able to override this limit inside their
docker-compose.ymlconfig file. For example: set- SCAN_WORKER_MAX_RAM=3for a max-3GB-per-scan-worker RAM limit. -
๐ฏ New regional ID detectors (under NationalโSSN) for the Gulf countries: Saudi Arabia, Emirates, Bahrain, Qatar.
-
๐ฏ Improved recall of phone numbers detector.
-
๐ฏ Improved recall of names detector.
-
๐ฏ Improved precision of credit cards detector.
-
๐ฏ Improved precision of passwords detector.
-
โก Improved performance of Google Drive scanning and redaction.
-
๐ด Improved parsing of broken / malformed Word documents.
-
๐ด Improved parsing of Excel sheets with invalid dates.
-
๐ด Improved robustness of redaction of CSV, EML and MSG files.
-
๐ด A number of minor UI and stability fixes.
v5.1.0, 17 February 2025
-
โ Redaction profiles and partial PII redactions
Previously, whenever redacting PII within a document, PII Tools would mask the whole detected PII. For example, it would black out the entire name or SSN number inside PDFs and images, or replace that PII by
XXXXXin Excel, Word, CSVs, emails and plain text files.In v5.1.0 we made redaction more flexible. Users are now able to refine redactions by leaving some parts of the PII unredacted (e.g. "when redacting SSNs, leave the last four SSN digits in the clear").
We are working on extending redactions further, allowing PII pseudonymization ("replace John Smith by Adam Wirth") and tokenization ("replace John Smith by an opaque token, which can later be turned back into John Smith using a secret key").
To tame all this flexibility in a user-friendly workflow, v5.1.0 introduces "Redaction profiles" โ a new tab in the main UI menu. Redaction profiles let users define rules for what-transformation-should-happen-to-what-PII-type, then store those rules persistently into a Redaction Profile, to then recall that profile whenever launching a redaction operation.
For more information please review https://documentation.pii-tools.com/#redaction-profiles.
-
โ Display additional details inside the Remediations tab, to make navigating Remediations tasks and progress easier.
-
๐ฏ Improved PII recall for Word documents.
Specifically, we improved parsing of more exotic documents with nested tables, control fields and text boxes.
-
๐ฏ Improved PII recall from complex Excel tables.
-
๐ฏ Improved recall of driving license image detector.
-
๐ฏ Improved precision of credit card detector.
-
๐ฏ Improved precision of SSN detector.
-
โก Optimize processing of macro-enabled Word documents.
v5.0.2, 2 February 2025
-
โ Allow erasing whole archives.
The menu for Secure Erase now contains a new option for "What to do about files in archives?": "Erase the whole archive".
-
โ Add option to scan Salesforce sandbox environments.
Previously, customers could scan production Salesforce environments. To support internal deployment processes, PII Tools now also allows scans over test (sandbox) Salesforce environments.
PII Tools documentation has been updated to clarify the necessary Salesforce authentication steps: https://documentation.pii-tools.com/#salesforce
-
โ Improve in-place redaction of Excel sheets.
-
๐ฏ Improve PII detection in tabular formats (Excel, CSV, tables in PDFs, tables in imagesโฆ) that do not really contain tables.
Some documents may contain semi-structured data formatted visually into a table, where the information is nevertheless not really tabular: the data is not rows, nor really columns.
PII Tools now does a better job understanding such jumbled or rotated tables, and extracting PII from them correctly.
-
๐ฏ Improve precision of detecting credit cards in tables.
-
๐ฏ Implement Singapore NRIC (national id) detector under the National-SSN category.
-
๐ฏ Improve accuracy of the SSN, home address, person name, IP, and health detectors.
-
โก Optimize processing of Office documents (Word, PowerPoint, โฆ)
v5.0.1, 25 December 2024
-
โ Allow remediating the same objects again.
A common feature request we got with remediation was to go and run another remediation action over the same files again. Previously, this was not possible in PII Tools: once you remediated the files โ such as redacted or deleted them โ these objects disappeared from the PII Tools inventory, so you could not access them or remediate them again. This was a problem for files that FAILED to remediate, for example due to a permission or access error. The only solution was to re-scan the affected files to bring them back into the PII Tools inventory, so they could be remediated again. While workable, this was a tedious, user-unfriendly process.
In v5.0.1, you can re-remediate files even when they are not in the inventory any more. Simply submit a list of files to remediate using the "Remediate from locations" button in your Remediations UI tab. There is no need to rescan such files first.
-
๐ฏ Improve OCR, especially from PDF forms.
-
๐ฏ Improve precision of the Financial cheques detector.
-
โก Improve speed of the remediation process.
-
๐ด Fix bug where Office Word documents that contained the "CURRENT DATE" dynamic placeholder could produce different PII positions depending on when the scan happened, leading to redaction mismatch.
For example, "CURRENT_DATE John Smith" would detect "John Smith" with one PII position offset when scanned on "10 January 2024", and another PII offset when scanned on "1 March 2024", due to different lengths of the created date.
v5.0.0, 10 December 2024
-
โ Redact Office documents.
PII Tools will now redact native Word (.doc, .docx, etc) and Excel (.xls, .xlsx, .xlsb, .xlsm, โฆ) documents, including in-place redaction. The resulting file is exactly like the original, except all (or all selected) PII is redacted out.
This means that customers who purchased the Remediation module are able to surgically redact all major file types now, from PDF to Word and Excel, to CSV, text and emails.
-
โ Redact emails.
Similar to the above, PII Tools is now able to redact MSG and EML emails, including attachments. The output EML email will be re-assembled from the redacted original email headers, email body and attachments.
-
โ Added a new report type: "Duplicates".
This report lists all file duplicates in your inventory, in an easy-to-process CSV format.
The duplicates are determined based on their file content, regardless of filenames and other metadata, so that duplicates are captured reliably across all scanned storages (device, S3, email attachments, OneDriveโฆ).
-
โ Allow uploading a list of files to scan (or skip) in "Accept filenames" and "Reject filenames".
This feature was requested by customers who have a list of documents (filenames) to scan, but they don't know the exact location.
You can now collect a list of all such filenames or partial file paths into a text file, with one filename per line, and then upload that text file into "What file to scan?" while configuring your scan.
The "Accept filenames" text file may be arbitrarily large, containing tens or hundreds of thousands of filenames inside.
This functionality is similar to the existing "Upload Root folder from file" button. But unlike "Root folders" which accepts exact full paths such as
C:\folder\subFolder\my_file.pdf, this "Accept filenames" solution is more flexible because it allows partial location matches such asmy_file.pdforsubFolder/my_file.pdf, without needing to specify the full absolute path. -
๐ฏ Improve accuracy of SSN detector.
-
๐ฏ Improve parsing of MSG files.
-
๐ฏ Optimize parsing of larger XLSB (binary Excel) spreadsheets.
-
โก Improve speed and robustness of the remediation process.
-
โก Improve handling of "mailbox concurrency limits" in Microsoft Graph API (affects Exchange Online scans).
-
โก Optimized the "Forget" remediation action.
-
๐ด Extend the orange "scan status" badge to scans with (at least one) failed incomplete archive.
Previously, an archive that errored out during scanning could lead to a "green" icon for the whole scan. The new "orange" icon reflects more clearly the status of the scan, i.e. some files could not even be accessed by PII Tools, and thus the total SCANNED/SKIPPED/FAILED numbers may be incomplete.
As usual, check the FAILED items in your scan's Audit log to see what failed exactly and why.
-
๐ด Several smaller fixes to UI and server.