Extract text from files
Automate tasks and speed up document ingestion by extracting text from files for processing.
With Squid's text extraction feature, you can easily process the contents of multiple large documents for fast turnaround of information. Every industry deals with multiple types of documents that are integral to the business. From compliance documentation, to earnings reports, to scientific studies, there are always documents that need to be reviewed and actions that must be taken based on their contents. Manual ingestion of this information is time-consuming and error-prone. With Squid, these tasks that once took hours or days are completed in seconds. Get the information you need right away to be more productive and lead your industry.
Use cases
- Automate turning the information in your unstructured documents into structured datasets, allowing you to quickly read your resources, process the text, and write to your data source. From there, you can query your data programmatically using the Client SDK, ask questions about your data and Squid AI will run the queries for you using Query with AI, and even generate charts and graphs on the fly using Query with AI.
- Extract text to pass to an AI agent that can take relevant actions depending on the contents of the text.
Text extraction requires admin privileges, so it should only be performed in a secure environment with access to your Squid API key like the Squid backend or other server environment.
Create the extraction client
To perform text extraction on a document, first create an extraction client using the extraction()
method:
const extractionClient = this.squid.extraction();
Extract the text
Use the extraction client's extractDataFromDocumentFile
method to extract text from the file. The method takes one of two types: File
or BlobAndFileName
. The following example shows extracting text using the BlobAndFileName
type:
const extractionClient = this.squid.extraction();
const data = {
blob: dataBlob,
name: 'myDocument.pdf',
};
const extractedResult = await extractionClient.extractDataFromDocumentFile(
data
);
console.log(extractedResult.pages[0].text); // 'Q4 Development Plan...'
The extractDataFromDocumentFile
method returns a promise that resolves to an array of pages
. The text of a given page can be accessed using the text
attribute.
Next steps
Once your text is extracted, you can:
- Parse the text and write to a database using Squid's Client SDK.
- Pass text as part of a query to a Squid AI Agent to answer questions or take actions based on the text.