Tools Configuration
The Tools module provides utility functions for content processing and extraction. This guide covers configuration and usage of available tools.
Quick Start
Minimal Configuration
Configure the article extractor tool:
tools:
- id: extract
type: article_extractor
No additional configuration or API keys are required for the article extractor.
Using the Tool
Extract content from a URL:
curl http://localhost:${PROACTIONS_HUB_PORT}/tools/extract?url=https://example.com/article
Response:
{
"title": "Article Title",
"description": "Article description",
"author": "Author Name",
"published": "2024-01-15T10:30:00.000Z",
"content": "Full article text...",
"links": ["https://..."],
"image": "https://example.com/image.jpg"
}
Tools Configuration Structure
Each tool in the tools array has this structure:
tools:
- id: string # Required: Unique identifier
type: string # Required: Tool type
options: object # Optional: Tool-specific options
Required Fields
| Field | Description |
|---|---|
id | Unique identifier for the tool (used for logging and references) |
type | Tool type - currently only article_extractor is supported |
Optional Fields
| Field | Description | Default |
|---|---|---|
options | Tool-specific configuration options | {} |
Available Tools
Article Extractor
The article extractor uses @extractus/article-extractor to extract content from web pages.
Configuration
tools:
- id: extract
type: article_extractor
Endpoint
GET /tools/extract?url={url}
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Yes | URL of the web page to extract |
Response Fields
The extractor returns the following fields (when available):
| Field | Description |
|---|---|
title | Article title |
description | Meta description or summary |
author | Author name(s) |
published | Publication date (ISO 8601) |
content | Main article text content |
links | Array of URLs found in the article |
image | Featured image URL |
url | Source URL |
source | Source domain |
Example Usage
Basic extraction:
curl http://localhost:${PROACTIONS_HUB_PORT}/tools/extract?url=https://blog.example.com/article-title
Response:
{
"title": "How to Configure ProActions Hub",
"description": "A comprehensive guide to configuring ProActions Hub",
"author": "Jane Developer",
"published": "2024-01-15T10:30:00.000Z",
"content": "ProActions Hub is a powerful tool...",
"links": [
"https://docs.example.com",
"https://github.com/example/repo"
],
"image": "https://blog.example.com/images/cover.jpg",
"url": "https://blog.example.com/article-title",
"source": "blog.example.com"
}
Supported Content Types
The article extractor works best with:
- Blog posts and articles
- News articles
- Documentation pages
- Medium articles
- WordPress sites
- Generic HTML pages with article content
Limitations
The extractor may have difficulty with:
- JavaScript-heavy single-page applications (SPAs)
- Pages requiring authentication
- Paywalled content
- Dynamically loaded content
- PDFs and non-HTML documents
Logging
Tool operations are logged when tools logging is enabled:
logging:
types:
tools:
enabled: true
console: true
file: false
level: info
Log output includes:
- Tool type and ID
- Request URL
- Extraction success/failure
- Processing time
- Error details (if applicable)
See Logging Configuration for details.
Error Handling
Common Errors
Invalid URL
Error:
{
"statusCode": 400,
"message": "Invalid URL parameter"
}
Solution: Ensure the URL is properly formatted and URL-encoded.
Extraction Failed
Error:
{
"statusCode": 500,
"message": "Failed to extract content from URL"
}
Solutions:
- Verify the URL is accessible
- Check if the page requires authentication
- Try a different URL with clearer article structure
- Check network connectivity from the container
Timeout
Error:
{
"statusCode": 504,
"message": "Request timeout"
}
Solution: The target website may be slow or unresponsive. Consider configuring HTTP client timeout in hub settings.
Error Response Structure
All errors follow the NestJS standard format:
{
"statusCode": 400,
"message": "Error description",
"error": "Bad Request"
}
Advanced Configuration
HTTP Client Timeout
Configure timeout for tool HTTP requests in hub settings:
hub:
http_client:
timeout: 120000 # 2 minutes for slow websites
maxRedirects: 5
Custom User Agent
Tools use the same HTTP client configuration as the hub. To set a custom user agent, you would need to extend the tools module (contact your development team).
Testing Tools Configuration
Test Article Extraction
# Test with a known article URL
curl http://localhost:${PROACTIONS_HUB_PORT}/tools/extract?url=https://example.com/article
# Test with URL encoding
curl "http://localhost:${PROACTIONS_HUB_PORT}/tools/extract?url=https%3A%2F%2Fexample.com%2Farticle"
Test with Swagger UI
If Swagger is enabled:
- Navigate to
http://localhost:${PROACTIONS_HUB_PORT}/docs - Find the
/tools/extractendpoint - Enter a URL
- Execute the request
Verify Logging
Enable tools logging and check logs:
# Podman
podman logs -f proactionshub | grep tools
# Docker
docker compose logs -f | grep tools
Best Practices
Usage
- URL encode parameters - Always URL-encode the URL parameter
- Handle extraction failures - Not all pages can be extracted successfully
- Cache results - Consider caching extraction results to reduce load
- Validate URLs - Validate URLs client-side before sending requests
Security
- Validate target URLs - Be cautious of SSRF attacks
- Rate limiting - Consider implementing rate limiting for extraction endpoints
- Content filtering - Be aware that extracted content is user-controlled
Performance
- Set appropriate timeouts - Configure HTTP client timeout for slow websites
- Implement caching - Cache extraction results in your application
- Monitor extraction times - Track slow extractions through logging
- Use async processing - For large batches, process extractions asynchronously
Reliability
- Handle errors gracefully - Extraction can fail for many reasons
- Implement retries - Retry failed extractions with exponential backoff
- Log extraction failures - Monitor failure rates and patterns
- Provide fallbacks - Have alternative content sources when extraction fails
Future Tool Extensions
The tools module is designed to be extensible. Future tools might include:
- PDF text extraction
- Image processing
- Audio transcription
- Video processing
- Document conversion
Contact your development team if you need custom tools added to the module.
Next Steps
- Configure Logging to monitor tool operations
- Review Security Hardening for production deployments
- Operations Guide for monitoring and troubleshooting