Category Archives: Python

Migrating from Gallery Menalto 1.x to Piwigo: An Open Source Solution

Introduction

Migrating a large collection of photos and albums from an outdated photo gallery software can be a daunting task. With the aim of addressing this challenge, I have developed a robust framework to facilitate the migration from Gallery Menalto to Piwigo. This open-source project, licensed under GPL 3, not only aids in migrating to Piwigo but also provides the flexibility to migrate to any photo gallery software with an open API.

Background

Around 2004, I deployed a self-hosted photo gallery to organize, store, and maintain digital photos. At the time, this was a new area for me, and I had little experience managing digital images. The motivation was clear: my first daughter had just been born, and I wanted to ensure that these precious memories were easily accessible not only for myself but also for other family members, including grandparents.

For various reasons, I chose Gallery Menalto as my gallery software and ran it on OpenBSD 4.0 with a Geeklog 1.3 website. This setup was hosted in my closet in an apartment in Finland. Ten years later, several more daughters had been born, and I had moved to Canada. Despite the passage of time, both GeekLog and Gallery Menalto were still running on the same old OpenBSD system, which I had virtualized on top of VMware Server 2.0. The whole system had moved from the closet to the basement.

By then, I had amassed over 10,000 photos, and I was increasingly stressed about how to migrate them from the legacy system. Although I had attempted several times to upgrade from OpenBSD to Linux, the process was far from simple.

Another ten years passed, and the system was back in Finland, still running on a virtualized OpenBSD on top of VMPlayer. I couln’t even get the VM to run on libvirt because the OpenBSD 4.0 can’t handle that. Upgrading had become even more challenging; what was a legacy system ten years ago had become practically prehistoric.

I searched extensively for a solution, but nothing seemed to work. The Gallery version was 1.4, which couldn’t be upgraded to 2.x and then to 3.x, making the transition to something like Piwigo impossible. OpenBSD hadn’t received any updates either, but the environment was jail-rooted, and Gallery 1.x, with its known internals, didn’t pose significant security risks – it was wide open anyway.

Since no ready-made solution was available, I decided to create my own. I started by migrating all the articles from Geeklog to WordPress, a task that took about two nights and was relatively straightforward – if you can call something where you need to have gateway MySQL to run on docker container because of the legacy source system and bunch of python scripts straight forward, it was straightforward. Next, I delved into the internals of Gallery 1.4, experimenting through trial and error. Within a few nights, I had a plan.

Instead of attempting to migrate everything at once, I decided to collect the metadata from Gallery 1.x in one phase and perform the actual migration in a second, separate phase. This approach meant that the first phase was not dependent on the second, allowing me to choose any target gallery software if I decided against Piwigo later on.

Once this idea took shape, I managed to get everything working relatively quickly. I started the migration yesterday, and as of now, I have successfully completed it. I had to make a few changes to the code along the way, but the final version is now available in my GitHub repository.

Project Overview

The project comprises two main Python scripts:

  1. Collect Gallery Metadata (collect_gallery_meta_data.py): This script gathers meta information for all albums and photos from Gallery Menalto.
  2. Execute Migration (execute_migration.py): This script performs the actual migration of albums and photos, along with their metadata, to Piwigo.

Collecting Gallery Metadata

The collect_gallery_meta_data.py script is designed to traverse through the albums and photos in a Gallery Menalto installation, collecting crucial metadata and storing it in a MySQL database (or any SQLAlchemy-supported database). This metadata includes:

  • Album name
  • Parent album
  • Album caption
  • Album title
  • Album description
  • Photo filename
  • Photo caption
  • Photo title

How It Works

  1. Command-line Argument: The script takes an album located in the Gallery 1.x root as a command-line argument.
  2. Traversal: It recursively goes through all sub-albums and photos, collecting metadata.
  3. Storage: The collected data is stored in database tables (albums and photos)

Executing the Migration

The execute_migration.py script utilizes the metadata collected to perform the migration to Piwigo. It:

  • Creates the root album and sub-albums in Piwigo.
  • Uploads photos along with their metadata (capture date, upload date, caption, title, description) to Piwigo.

Key Features

  1. Download Photos: Photos are downloaded from Gallery Menalto only once. The field downloaded is set to 1 when the download is complete.
  2. Upload Photos: Once a photo is successfully uploaded to Piwigo, the field uploaded is set to 1.
  3. Track Migration: The albums table has a column migrated, which is set to 1 once an album is migrated. This ensures the process can be safely resumed if interrupted.

Advantages and Applications

This framework is highly beneficial for users who wish to transition from Gallery Menalto to modern photo gallery software. Its primary advantages include:

  1. Flexibility: While designed for Piwigo, the framework can be adapted for any photo gallery software with an open API.
  2. Efficiency: Metadata and photos are efficiently handled and migrated, ensuring data integrity and completeness.
  3. Open Source: As an open-source project under the GPL 3 license, it encourages collaboration and customization.

Getting Started

Clone the Repository:

git clone https://github.com/elsonico/gallery-piwigo-migration.git
cd gallery-piwigo-migration

Set Up the Database:

Ensure you have a MySQL database (or any SQLAlchemy-supported database) ready. Here’s the DDL for the two tables:

albums

CREATE TABLE `albums` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `parent_id` int DEFAULT NULL,
  `metadata` text,
  `meta` text,
  `title` varchar(255) DEFAULT NULL,
  `caption` text,
  `description` text,
  `migrated` tinyint(1) DEFAULT '0',
  `created` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `parent_id` (`parent_id`),
  CONSTRAINT `albums_ibfk_1` FOREIGN KEY (`parent_id`) REFERENCES `albums` (`id`)
)

photos

CREATE TABLE `photos` (
  `id` int NOT NULL AUTO_INCREMENT,
  `album_id` int DEFAULT NULL,
  `filename` varchar(255) NOT NULL,
  `caption` text,
  `metadata` text,
  `meta` text,
  `type` varchar(10) DEFAULT NULL,
  `description` text,
  `url` varchar(255) DEFAULT NULL,
  `downloaded` tinyint(1) DEFAULT '0',
  `uploaded` tinyint(1) DEFAULT '0',
  `capturedate` datetime DEFAULT NULL,
  `uploaddate` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `album_id` (`album_id`),
  CONSTRAINT `photos_ibfk_1` FOREIGN KEY (`album_id`) REFERENCES `albums` (`id`)
)

Collect Metadata

Before starting the metadata collection you need to have few environment parameters set, since the code expects to have them:

  • GALLERY_BASE_URL
    • This is your gallery URL including your album location: https://[gallery_host]/[albums]
    • This is the location you can find your albums.dat
  • DATABASE_URL
    • This is your MySQL URL: mysql+pymysql://[user]:[password@[host]/[database]

Once all above including the database tables are created your are good to go:

python collect_gallery_meta_data.py [root_album]

Execute Migration

As for collecting the metadata for doing the migration you need several environment variables set:

  • GALLERY_BASE_URL
    • Same as above depending on your configuration: https://[gallery_host]/[albums]
  • PIWIGO_API_URL
    • This is: https://[piwigo_host]/piwigo/ws.php
  • PIWIGO_USERNAME / PIWIGO_PASSWORD
    • Piwigo credentials with privileges to create albums and add photos
  • MIG_DB_HOST / MIG_DB_NAME
    • The database host and database your store the albums and database tables
  • MIG_DB_USER / MIG_DB_PASSWORD
    • Credentials for the migration database
  • PW_DB_HOST / PW_DB_USER / PW_DB_PASSWORD / PW_DB_NAME
    • The Piwigo database host, database and credentials

Once you are certain above environmental variables are correctly set you are good to go and migrate the Gallery albums from root album level album by album to Piwigo.

The migration start with below command:

python execute_migration.py [root_album]

The program expects to find album information collected for [root_album] so you must ensure you have data collected with collect_gallery_meta_data.py before getting to this phase. Note also, that you can edit the meta information through database tables if you want to change anything there.

Conclusion

Migrating from Gallery Menalto to Piwigo, or any other modern photo gallery software, is now more manageable with this open-source framework. By leveraging the power of Python and SQLAlchemy, this project ensures a seamless transition, preserving all your precious photo memories. I warmly welcome contributions and collaboration to further enhance and extend this tool.

Building a DIY AI Chatbot: Control Your Conversations

Introduction

A self-built AI chatbot is crafted entirely by an individual or team from scratch, without relying on pre-existing templates or platforms. This approach gives developers complete autonomy over the coding, features, and functionalities of the chatbot.

Creating a self-built AI chatbot demands a blend of programming expertise, a deep understanding of artificial intelligence, and inventive thinking. Developers can use a variety of programming languages, including Python, Java, or JavaScript, based on their preferences and the chatbot’s intended application.

One of the standout advantages of a self-built AI chatbot is its high level of customization. Developers can fine-tune the chatbot’s responses and functionalities to meet specific needs and objectives. Moreover, they can continually refine and enhance the chatbot on their own timetable, independent of external updates or support.

Getting started

Building a chatbot from scratch might seem daunting, but it’s quite feasible with the right tools. I used the OpenAI API and the Python openai library (version 1.23.2 as of this writing). While GPT-4 typically suggests using openai==0.28, the transition to versions above 1.0 signifies substantial changes and necessitates thoughtful consideration. However, this doesn’t mean that ChatGPT cannot assist in coding—it can, though it requires precise instructions.

Technical setup

For my project, the technical foundation included:

  • Python 3.9.x or higher: I chose Flask as the application server.
  • Access to the OpenAI API: Essential for integrating the AI logic into the chatbot

This setup is sufficient to establish a testing environment for the AI logic, connecting the Python code to the OpenAI API.

Advanced configuration

After thorough testing, I moved on to production. I continued using Flask for its simplicity, but also added Gunicorn as a frontend server. The application runs either as a standalone version or embedded within a WordPress blog.

I explored different operational models, including storing interactions in a database and the Bring Your Own Data (BYOD) model, although the latter’s impact on performance is still unclear. Initially, I deployed the gpt-3.5-turbo-instruct model for its speed and contextual retention. However, for superior output quality, I ultimately chose GPT-4 despite its slower response time.

The AI Bot Herself

The embedded ChatBot is utilizing gpt-3.5-turbo-instruct whereas the one on below links is utilizing gpt-4 model. The later needs a bit time to think, but she will get there… You can compare the results.

Conclusions

A self-built AI chatbot can serve myriad purposes—customer support, entertainment, educational assistance, or personal aid, and can be integrated across websites, messaging platforms, or mobile apps.

For me, the project was primarily an exploration of AI technologies and the OpenAI API. It was also an invaluable learning experience in Python, application servers, and container technologies.

Building a self-built AI chatbot is undoubtedly a complex, resource-intensive endeavor that necessitates ongoing updates and maintenance. Yet, the potential for continuous learning and improvement through natural language processing and machine learning algorithms makes it increasingly efficient and precise over time.

From a Friday morning start to a productive Monday evening, my journey with this project underscores the potential and versatility of AI technologies, making a self-built AI chatbot a potent, customizable tool for any tech-driven initiative.

References

Streamlining Your Database Migration: A Guide to Leveraging OpenAI API for Seamless Assessments

Database migration is a complex process that demands careful assessment to ensure data integrity, application performance, and overall system reliability. The OpenAI API, with its advanced natural language processing capabilities, offers a way to simplify this process by automating assessments and summarizing key points. This guide will walk you through using the AWS Schema Conversion Tool (AWS SCT) for initial assessments, integrating the OpenAI API with Python to generate assessment summaries, and understanding the requirements for connecting with Azure OpenAI API, as well as its differences from ChatGPT OpenAI.

Kickstarting Your Migration: Utilizing AWS SCT for Comprehensive Database Assessment

The Amazon Web Services Schema Conversion Tool (AWS SCT) simplifies database migration from one platform to another. It assesses your existing database schema and generates a detailed report on potential migration issues. Supporting a wide range of source and target databases, AWS SCT is versatile for many migration scenarios.

AWS SCT examines your database schema, identifies non-convertible elements, and produces a comprehensive report. This report, containing potential action items, is crucial for planning your migration, offering an overview of the complexity, potential challenges, and the effort required.

The report, in PDF format, provides a detailed view of your database schema, potential issues, and recommendations. While invaluable for database administrators and engineers, the report’s extensive and complex nature makes OpenAI API a perfect tool for simplification and summarization.

Transforming PDFs into Comprehensive Assessment Summaries

With the AWS SCT report in hand, the next step is to utilize OpenAI API’s sophisticated natural language processing capabilities. By reading and understanding the PDF report, OpenAI can extract key points and summarize the information in a more accessible format.

Using the Python package pymupdf, we scan the PDF and convert its contents to text. This text is then fed to OpenAI API to highlight important sections and summarize the findings, including potential issues and recommended actions.

The Python method process_directory reads each PDF, converts it to text, and then passes this text to another method, generate_summary, which calls the OpenAI API to generate a concise assessment summary.

Method: process_directory()

def process_directory(directory):
    """Processes each PDF file in the given directory to generate a summary."""
    hostname, port_number, database_name = directory.split('_')
    for file in os.listdir(directory):
        if file.endswith('.pdf'):
            file_path = os.path.join(directory, file)
            pdf_text = extract_text_from_pdf(file_path)
            summary = generate_summary(pdf_text)
            print(f"Summary for {file} ({hostname}, {port_number}, {database_name}):\n{summary}\n")

Method: generate_summary()

def generate_summary(text):
    """Generates a summary for the given text using OpenAI's API."""
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are database \
               reliability engineer providing migration \
               assessment summary."},
            {"role": "user", "content": "Summarise the output \
              of assessment text: \n" + text}
        ],  
        temperature=0.4,
        max_tokens=150,
        top_p=1.0,
        frequency_penalty=0.0,
        presence_penalty=0.0
    )   
    summary = response.choices[0].message.content.strip()
    return summary

Understanding OpenAI API Parameters

Understanding the role and impact of various OpenAI API parameters is crucial for tailoring your query results. Here’s a brief overview:

temperature (0.4): This parameter controls the level of creativity or randomness in the responses generated by the model. A lower temperature, such as 0.4, results in more predictable and conservative outputs. Conversely, a higher temperature encourages diversity and creativity in the answers.

max_tokens (150): Specifies the maximum length of the generated response measured in tokens (words and characters). Setting this to 150 means the response will not exceed 150 tokens, ensuring concise and to-the-point answers.

top_p (1.0): Also known as “nucleus sampling,” this parameter filters the model’s token generation process. A value of 1.0 means no filtering is applied, allowing any token to be considered. Lowering this value helps in focusing the response generation on more likely token sequences, potentially enhancing relevance and coherence.

frequency_penalty (0.0): Adjusts the likelihood of the model repeating the same line of text. A value of 0.0 implies no penalty on repetition, enabling the model to freely reuse tokens. Increasing this value discourages repetition, fostering more varied and dynamic outputs.

Above python methods generated modest summary from my sandbox environment. Modest at this point – we can take this much further though. I’ve taken small part of whole summary describing migration effort from MS SQL Server 2019 database to RDS for PostgreSQL.

Migration Plan Summary

Source Database

  • AdventureWorks2019.MSSQL
  • Microsoft SQL Server 2019 (RTM-CU22-GDR) – 15.0.4326.1 (X64)
  • Standard Edition (64-bit) on Windows Server 2019 Datacenter
  • Case sensitivity: OFF

Target Platform:

  • AWS RDS for PostgreSQL

Assessment Findings:

  • Storage Objects: 100% can be converted automatically or with minimal changes.
  • Code Objects: 77% can be converted automatically or with minimal changes.
  • Estimated 99.9% of code can be converted to AWS RDS for PostgreSQL automatically.
  • 515 conversion actions recommended ranging from simple to complex tasks
OPENAI

Above AI-generated summary can be a significant time saver for database administrators and engineers. Instead of going through pages of detailed reports, they can quickly glance through the summary and understand the key points. It can also be used as a reference guide during the migration process, helping to avoid potential issues and ensuring a smooth transition.

Building a fully automated OpenAI-Powered Python Module for PDF Analysis and Summary Generation

To generate the assessment summary using the OpenAI API, I developed the Python methods described above. These methods are components of a larger assessment framework that I’m currently developing. In this article, we focus exclusively on the integration with the OpenAI API. It’s worth noting that the PDF files used as input are generated through a fully automated process. However, the details of that process are beyond the scope of this blog post.

Python, with its versatility and powerful capabilities, is ideal for integrating with the OpenAI API. It offers libraries for API interactions and processing PDF files, enabling the automation of the entire workflow—from reading PDF files to generating summaries.

For the initial step, libraries such as PyPDF2, PDFMiner, or pymupdf—which I prefer—can be utilized to read the contents of PDF files. After extracting the text, this information can be processed by the OpenAI API. The API is designed to analyze the text, pinpoint the essential information, and compile a concise summary.

Subsequently, this summary can be saved either as a text file or within a database for easy access in the future. Moreover, the module can be configured to insert summaries into a database table, integrating them into a larger assessment data repository. This data can then be leveraged for generating reports, such as Power BI dashboards or other forms of reporting, allowing key stakeholders to stay informed about the migration process’s progress.

Setting Up Azure OpenAI API: Essentials and Differences from ChatGPT

The Azure OpenAI API is a cloud-based service enabling developers to integrate OpenAI’s capabilities into their applications. To utilize the Azure OpenAI API, one must have an Azure account and subscribe to the OpenAI service, in addition to generating an API key for authentication during API requests.

There are notable differences between utilizing ChatGPT and the Azure OpenAI API.

For ChatGPT, your Python module only requires the openai.api_key to be set, along with specifying the model, such as “gpt-4” in my example code. However, integrating with the Azure OpenAI API necessitates additional configuration:

    openai.api_base = os.getenv('AZURE_OPENAI_ENDPOINT')
    openai.api_key = os.getenv('AZURE_OPENAI_API_KEY')
    openai.api_version = os.getenv('AZURE_OPENAI_VERSION')
    openai.api_type = "azure"
    deployment_id = os.getenv('AZURE_OPENAI_DEPLOYMENTID')

It’s important to note that when using Azure OpenAI, Python OpenAI API parameter model corresponds to your specific deployment name instead of “gpt-4” as it was for ChatGPT model in my examples earlier.

The Azure OpenAI API and ChatGPT OpenAI both offer advanced natural language processing capabilities, albeit tailored to different use cases. The Azure OpenAI API is specifically designed for embedding AI functionalities into applications, whereas ChatGPT OpenAI excels in conversational AI, facilitating human-like text interactions within applications.

Choosing between the two for summarizing database migration assessments hinges on your project’s unique needs. Azure OpenAI API is the preferable option for projects requiring deep AI integration. On the other hand, if your application benefits from conversational AI features, ChatGPT OpenAI is the way to go.

In summary, utilizing the OpenAI API can drastically streamline the database migration assessment process. The AWS Schema Conversion Tool yields a thorough report on your database schema and potential issues, which can efficiently be condensed using the OpenAI API. By developing a Python module, this summarization process becomes automated, thus conserving both time and resources. Regardless of whether Azure OpenAI API or ChatGPT OpenAI is chosen, each offers potent AI capabilities to facilitate your database migration endeavors.