Enhancing Productivity with AI
In the coming months, DBGallery is poised to significantly expand its AI capabilities, with a clear focus on enhancing user productivity in digital asset management (DAM).
Our approach will be to iteratively introduce features that were previously deferred due to limited client demand or nascent AI maturity, such as Facial Recognition and Retrieval-Augmented Generation (RAG), respectively. A standout addition will be RAG, a feature now possible due to recent advancements in AI capabilities.
As testimant to iteratively releasing these new capabilities, since this post was originally posted in Autumn 2024 the product team has released AI-generated descriptions, text recognition, and facial recognition!
The list of upcoming enhancements:
-  Retrieval-Augmented Generation (RAG) 
-  Natural Language Search 
-  AI for Video: voice transcription, facial recognition, text extraction, along with object recognition tagging. 
Before diving into these upcoming innovations, let's first revisit the current AI-driven capabilities within DBGallery.
Current AI Capabilities
DBGallery already integrates proven AI-driven features designed to streamline digital asset management:
- Object Recognition: Since 2018, DBGallery has utilized AI to automatically identify and tag common objects in photos. This early adoption significantly reduces the manual workload associated with metadata entry, providing immediate and substantial benefits.
- Custom-Trained AI Object Detection: Beyond generic object detection, DBGallery allows users to use custom-trained AI models tailored to their specific object detection requirements. See more on how this works in DBGallery in our blog post.
-  Enhanced Image Descriptions: To further improve the ability for users to find images they need immediately, DBGallery requests an AI (ChatGPT o1-mini) to generate detailed descriptions of images (typically during image upload). For example, when analyzing an image, the AI might generate a description like the one below this sample image:  When asking ChatGPT to describe the above image, it provided the description below. That text would be stored along with that photo in DBGallery's database, making it available for normal searches and greatly enhancing the capabilities of the natural language search. "This image showcases a remarkable piece of public art set against the backdrop of a modern cityscape. The focal point is a large, ring-shaped sculpture covered in intricate Arabic calligraphy. The sculpture, likely made of metal, has a silvery sheen that reflects the sunlight, creating a striking contrast with the blue sky and the surrounding urban environment. Behind the sculpture, contemporary skyscrapers with glass facades create a dynamic skyline, highlighting the blend of cultural heritage and progress in this urban setting." These detailed descriptions are stored alongside other metadata in DBGallery’s database, significantly enhancing the effectiveness of natural language searches. Users can provide their own prompts to the AI to customize the descriptions. See full details in our knowledge base: AI-generated Descriptions 
- Facial Recognition: Up to now DBGallery has opted not to include facial recognition due to limited client demand and privacy concerns. However, with the increasing AI awareness among clients, we now offer this feature, primarily targeting corporate and marketing firms. With privacy remaining a priority, this feature offers convenience for those managing collections with that include people, and will need to be opted into with full knowledge of privacy impact. See full details in our knowledge base: Facial Recognition 
- Text Recognition: Also known as image-to-text, or OCR (optical character recognition), this feature extracts text from images, converting it into searchable metadata. This capability enables more accurate and comprehensive search results based on text found in photos. A good example would be searches for photos that include company logos, where the text in those logos would be searchable within DBGallery. Other examples where it would useful include being able to search for data in photos of business cards or scanned documents. See full details in our knowledge base: Image-to-text (OCR) 
Other AI-Related Current Intelligent Features
While these features do not rely on the latest frontier AI models, they nonetheless deliver substantial improvements in productivity and digital asset management capabilities:
- Reverse Geocoding: Upon image upload or location update, DBGallery automatically adds address information using reverse geocoding, enriching the metadata associated with each asset. This metadata feeds into the product's excellent maps capabilities.
- Smart Search: Our intelligent search function prioritizes results based on the relevance of matching terms, ensuring that users find the most pertinent assets quickly and efficiently.
- Find Related Images: Introduced in Release 14.0 (Spring 2024), this feature suggests images related to the currently selected photo, enhancing the user experience by surfacing relevant content.
Upcoming: Leveraging Advanced AI Capabilities
Building on this solid foundation, DBGallery will soon introduce several advanced AI capabilities designed to further enhance productivity and user experience.
Retrieval-Augmented Generation (RAG)
DBGallery's forthcoming retrieval-augmented generation feature will significantly enhance how metadata is utilized in search queries. By sending image metadata, no images themselves, to an AI model such as ChatGPT, DBGallery will enable more intelligent and contextually aware search results. When a user searches for "dogs on a beach," the AI will use the data provided by DBGallery, delivering precise and relevant results. The system will then parse the AI's response, displaying the identified images in the standard results window.
Example
In one RAG test, an entire collections' metadata of one photographer, which spanned 30 years and had 110,000 images, was used. The question "What camera equipment was used?" supplied the following very accurate answer:
The sources, which describe John D.'s photography collection, show that he has used cameras from at least three different brands: Olympus, Canon, and Samsung.
- The Olympus C5050Z is mentioned in multiple entries from the early 2000s. John D. also used an Olympus C3030Z for photos taken between 1994 and 2001. Finally, one photo from 2008 was captured using a Canon PowerShot G9.
- A Canon EOS 20D appears in multiple entries, for images captured between 2004 and 2013.
- John appears to have transitioned to Samsung smartphone cameras in the 2010s. The specific models mentioned include the SGH-I337M, SM-G925W8, SM-G955W, SM-N960F, SM-N986W, and SM-S908E. John switched to a Samsung Galaxy S23 Ultra in 2024.
- Although many entries in the sources do not specify the camera model used, it is likely that the majority of the images captured after 2014 were taken using Samsung smartphone cameras, based on the identified trend.
- Natural Language Search
- Many readers of this post may be used to talking to ChatGPT and other chatbots. Inspired by the ease of interaction with ChatGPT and similar AI systems, DBGallery will soon allow users to perform searches using natural language queries. By building on the above retrieval-augmented generation capability, this feature will enable users to interact with DBGallery just as they would in a conversation, making it easier to find exactly what they need. For example:
- "Are there photos of a dog running on a beach in my collection?"
- "How many of those were taken in the autumn season?"
- "What's the total file size of the images currently displayed?"
- "Show me images that are good examples of modern architecture."
- "Which images have copyright information?"
- While many of these inquiries can already be addressed within DBGallery, this new feature will make the search process more intuitive. This will be especially true for new or infrequent users, where very little knowledge of DBGallery's extensive image finding abilities would be needed. Relevant to all users, it will also introduce capabilities not currently available, such as querying the total file size of displayed images, inquiring which images are missing keywords, or system managers asking activity monitoring questions (e.g. "How many files have been downloaded from shared links in the past week").
Image 1: The early draft layout showing a natural language search: "Show photos of Vietnam architecture". Along with the image results, the full AI text results is optionally shown in a right side panel.
The initial rollout will focus on search functionality, with plans to later expand into action commands, such as "Download all images of dogs running on a beach," or "Move the displayed files to a new DBGallery collection called 'happy K9s'."
AI for Video
With 2 - 4 below already available for photos and most graphics formats, the product team is currently working on the following for video:
- Voice transcription
- Facial recognition
- Text extraction
- Object recognition tagging
Summary
By embracing these advanced AI capabilities, DBGallery continues to empower digital asset management professionals with tools that not only streamline workflows but also unlock new levels of efficiency and insight. Stay tuned for these exciting developments as we continue to push the boundaries of what's possible in a digital asset management system.
Given privacy considerations, these features will be optional, allowing companies to activate them in system preferences based on their needs and privacy policies.
Related FAQs
Q: Can DBGallery's AI capabilities be customized for specific industries?
A: Yes, DBGallery offers a custom-trained AI object detection feature, allowing businesses to tailor AI recognition to their unique needs. This includes training the AI to detect industry-specific objects, proprietary products, and unique branding elements. Examples include:
Property Assessment: Lane Consulting Services used custom-trained AI to recognize specific apartment building features or defects for Physical Needs Assessment reports and Facilities Studies.
Architecture, Design, and Construction: AI can be tailored to identify particular architectural elements, design features, or construction materials.
Q: How do these new capabilities impact data privacy?
A: When using cloud-based AI services like Amazon's AWS Recognition or Microsoft's Azure Vision, images are uploaded to and may be stored on their servers. To avoid this, clients can opt for on-premise tools such as OpenCV, ensuring data stays in-house.
Q: How long does it take to set up a custom-trained AI in DBGallery?
A: Training the AI model is the most time-intensive aspect, typically taking between two days to three months or more, depending on the complexity. Once trained, integrating the model with DBGallery is straightforward. Ongoing retraining may be necessary to refine the model and maintain high accuracy levels. Highly experienced personnel here at DBGallery can aide with this effort, shortening the required time and improving accuracy.
