Migrate content from Shared Drives to Office 365 Cloud and enhance content discovery

ACES has several in-house tools we have created to expedite content migration from SharePoint and shared drives to SharePoint Online and Office 365. We are partners and resellers for most 3rd party solutions that provide auto-classification and auto-tagging capabilities. Concept Searching’s enterprise-class technology framework is the best tool for migrating content. This platform automatically classifies and tags content with metadata. It organizes and manages the metadata in taxonomies which are integrated as Term Sets into the SharePoint Term Store. The terms can be user-defined or auto generated by Concept Searching’s Artificial Intelligence (AI) algorithms that identify and extract meaning from structured, semi-structured, and unstructured content.

Concept Searching’s underlying technology consists of several components including the conceptClassifier and the Taxonomy Manager. These components work in concert to discover multi-word concepts within documents that are resident in both shared drives and SharePoint. At the core of the platform is compound term processing, a breakthrough technology that identifies and weights multi-word concepts, based on purely statistical analysis, independent of vocabulary, language, or grammatical style. In order to leverage compound terms in the ranking algorithms, it is necessary to understand the incremental value of higher order terms with regard to their lower order component parts. Concept Searching manages this using clue weights that are managed in the Taxonomy Manager. One of the main features of the Taxonomy Manager is its ease of use. The Taxonomy Manager component provides Content SME’s with the ability to create and maintain taxonomies that are stored in the SharePoint Online Term Store. Subject Matter Experts that are non-technical, can manage, monitor and modify the taxonomies as terminology and end user needs mature. This makes the Concept Searching platform the most scalable and adaptive metadata tagging and classification platform available and the only one that integrates with the SharePoint Term Store.

Sensitive information can be automatically tagged with a high degree of accuracy and precision based on the information contained in each piece of content. For example, most searches for PII like social security numbers (SSN) return many false positives while missing several documents with valid SSNs. Advanced AI algorithms within conceptClassifier have a long track record of ensuring only appropriate content is properly classified and tagged. Once documents and content in shared drives are properly curated and tagged, they will be uploaded to SharePoint. Although metadata will be embedded in the documents, it can also be automatically added to metadata columns in various SharePoint lists and libraries to enable advanced sort and filter capabilities within lists and libraries themselves. By enriching the lists and libraries with Metadata columns that pull from terms in the SharePoint Term Store, advanced search refiners can be integrated into search results providing an Amazon-like experience where you select “Shoes” and then are given the choice “Men’s or Women’s”, “Sport or Casual”, “Color”, “Size”, etc. – where you see all the results for any filter combination - very powerful.

Concept Searching can be used for more than just enriching content on shared drives with metadata. In the SharePoint Online environment, Concept Searching can be used to continuously evaluate content that is added or modified to automatically classify and tag or re-classify and re-tag content. Concept Searching and automatically notify interested parties of newly added or updated content associated with Terms or Term Sets of interest. As new documents are created, added or modified in SharePoint they are immediately classified and tagged which can be used to trigger workflows that automate a wide variety of processes and information flows within the organization. This is core to modernizing knowledge management and transforming business operations in organizations from poor to powerful.

Content Preparation

Certain characters have special meanings when used in filenames for SharePoint vs the Windows operating system. If a file or folder name on a Windows shared drive contains any of the following characters in brackets [" * : < > ? / \ |], it may prevent files and folders from uploading and/or properly syncing. ACES runs an analysis on the Client's X-Drive to automatically find and fix filenames with these characters. This is one of many steps in the extensive file structure preparation process we accomplish using our ACES proprietary tools we have built specifically to prepare federal government shared drives for migration to any of the GCC SharePoint Online environments. We identify and fix invalid characters in file and folder names. We identify any names that aren't allowed for files or folders such as .lock, CON, PRN, AUX, NUL, COM0 - COM9, LPT0 - LPT9, _vti_, desktop.ini, or any filename starting with ~$. We check file sizes to ensure they don’t exceed any cloud or customer-implemented size restrictions. We identify and remove duplicates while fixing them to ensure a smooth migration. We analyze shared drive folder paths and filenames to identify and fix any situations that may exceed the 255-character URL limitation when folder names are concatenated with the filename to form the SharePoint URL. In some cases we may need to work with content owners to restructure deeply nested folder structures and reduce the length of folder names or spaces in the names which transform to a “%20” and cost 3X as many characters in the URL path when transformed to a web address.

While we are curating and preparing shared drive content for migration to the GCC SPO, we stand up the conceptClassifier platform in parallel to include obtaining any certifications to operate in the Office 365 GCC. We connect the conceptClassifier platform to the Client X Drive and the Client’s GCC Tenancy and SPO sites. We validate the Taxonomies that Client wants to use to auto classify and auto tag their content. Once Concept Searching is fully operational, we use it to crawl the content on the X Drive and enrich the Documents with the desired metadata from sensitive information tags to names of folders in the filename path.

Once an organization’s folder structure is prepared for migration, we ensure all the proper permissions groups are established in the destination SPO libraries. We set the shared drive folders to read only and do one final scan of the organization’s files and fix any new issues with filenames, etc. This completes the Content Preparation phase.

Tagging Content

ACES uses in-house tools to analyze and curates content on the Client X-Drive. ACES uses Concept Searching to auto-classify and auto-tag shared drive documents in accordance with Client requirements. Our solution addresses many challenges associated with content management in the areas of search, content migration, content security, records management and text analytics to name a few. We overcome problems related to manually tagging items with metadata and improve any information environment by automating the classification and tagging process. Business rules automate classification decisions and enable governance policies to be automatically enforced. This mitigates risk and improves outcomes for content management and related processes. Our solution eliminates manual tagging and human inconsistencies that prohibit accurate metadata generation. We implement consistency, accuracy and efficiency in retrieval of files by implementing powerful taxonomy tools. We use auto-classification to identify content in context to users or line of business applications based on policy, process or role. We integrate Concept Searching results with SharePoint Managed Metadata Services to ensure compliance with government regulations while reducing costs associated with data exposures, remediation, litigation, fines and sanctions.

Content Migration

Once all the content curation and preparation is complete to include all metadata being added to the documents on the shared drive, content will be migrated to the Office 365 GCC environment using Metalogix’s Content Matrix migration solution. Should there be issues with Content Matrix or if the Client runs out of funding for Metalogix, ACES has a custom tool set that can automatically migrate the content as a contingency option. Folder structures containing files are replicated in the SharePoint Online (SPO) document libraries and permissions are mapped using Metalogix AD permissions mapping capabilities.

Manage Content

After the migration, Client may elect to continue to use Concept Searching to automatically classify content and manage taxonomies to enable concept-based searching and manage enterprise content. The metadata repository we develop can be used post-migration in the Office 365 environment and improve any application that requires the use of metadata. Our solution has an automatic monitoring for sensitive documents and locks down or removes sensitive documents during the migration process.

ACES understands that content roams everywhere, across devices, apps, and services as people in an organization need to collaborate with users inside and outside the organization. We leverage Microsoft Office 365’s sensitivity labels to classify and protect sensitive content across the organization and enforce protection settings based on that classification. We use sensitivity labels to apply a Confidential label to a document or email, and that label can encrypt the content and apply a Confidential watermark. The content is marked by adding custom watermarks, headers, or footers to email or documents that have the label applied. After the sensitivity label has been applied to content, we implement endpoint protection to prevent that content from being copied to a third-party app, or being copied to a removable storage. We assign a classification to content that persists and roams with the content as it's used and shared. This classification generates usage reports and activity data for the sensitive content.