Migrating comments from Cusdis to Comentario

Caoimhe ⬥ 1st of March 2025

The story so far: I was using Cusdis to provide a comments section for the bog but it proved to be broken and unmaintained so I replaced it with a self-hosted instance of Comentario.

I am going to walk through what I did to set up Comentario and import old comments from Cusdis. This is not a guide and the scripts that are posted below have serious problems that should be fixed before being used and I am not going to be the one to do that and you would obviously need to change any references to oakreef.ie to your own site.

Subdomain

First of all I needed to have an address to host the Comentario instance at. I chose a new subdomain at comments.oakreef.ie and had to update my Let’s Encrypt certificates to cover that new subdomain. I did not save the commands I used to do that but it was pretty straightforward to do from the command line.

Docker

Then I installed Docker on my server and following Damien’s example with a few tweaks I created my docker-compose.yml and secrets.yaml files.

docker-compose.yml

                version: '3'

services:
  db:
    image: postgres:17-alpine
    environment:
      POSTGRES_DB: comentario
      POSTGRES_USER: {INSERT POSTGRES USERNAME HERE}
      POSTGRES_PASSWORD: {INSERT POSTGRES PASSWORD HERE}
    ports:
      - "127.0.0.1:5432:5432"

  app:
    restart: unless-stopped
    image: registry.gitlab.com/comentario/comentario
    environment:
      BASE_URL: https://comments.oakreef.ie/
      SECRETS_FILE: "/secrets.yaml"
    ports:
      - "5050:80"
    volumes:
      - ./secrets.yaml:/secrets.yaml:ro


              

secrets.yaml

                postgres:
  host:     db
  port:     5432
  database: comentario
  username: {INSERT POSTGRES USERNAME HERE}
  password: {INSERT POSTGRES PASSWORD HERE}

              

Changing the ports configuration to 127.0.0.1:5432:5432 means that the Postgres database is only accessible from the server locally and not publicly available. I also don’t have an email setup for the Comentario instance currently.

Launching the instance is then just a matter of:

                sudo docker compose -f docker-compose.yml up -d

              

Nginx

Then I needed to modify my Nginx config to direct comments.oakreef.ie to the Comentario instance running on port 5050.

                server {
	server_name comments.oakreef.ie;

	listen 443 ssl;

	ssl_certificate     /etc/letsencrypt/live/oakreef.ie/fullchain.pem;
	ssl_certificate_key /etc/letsencrypt/live/oakreef.ie/privkey.pem;
	include /etc/letsencrypt/options-ssl-nginx.conf;
	ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;

	location / {
		proxy_pass http://127.0.0.1:5050;
		proxy_redirect off;
		proxy_http_version 1.1;
		proxy_cache_bypass $http_upgrade;
		proxy_set_header Upgrade $http_upgrade;
		proxy_set_header Connection keep-alive;
		proxy_set_header Host $host;
		proxy_set_header X-Real-IP $remote_addr;
		proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
		proxy_set_header X-Forwarded-Proto $scheme;
		proxy_set_header X-Forwarded-Host $server_name;
		proxy_buffer_size 128k;
		proxy_buffers 4 256k;
		proxy_busy_buffers_size 256k;
		add_header Cache-Control "private";
	}
}

              

Importing comments

Once there were a few comments on the new system I used the export feature in Comentario to get a JSON file and looked at how Comentario defined comment data in that. I also manually went through all the comments on the old system and made a basic CSV file of all of them with the author name, date posted, the URL of the post the comment was on and the text of each comment. I then wrote this Python file to take the exported Comentario comments—named basedata.json—and the CSV with the old Cusdis comments—comments.csv—and exported a new file with the combined data in the Comentario format.

There are some problems with this!

When importing data Comentario does not check for duplicates. I ended up creating duplicates of all the new Comentario comments that already existed on the site doing this and had to manually delete them. If you are doing this do not include existing comments as part of the file you are creating to import.
I did not include replies at all. I decided to try importing replies I had made to people as a second, separate, step (see the second Python script below). This made things more awkward down the line. Do everything in one batch.

                import csv
import json
from datetime import datetime, timezone
from dateutil.parser import parse
from pprint import pprint
from uuid import uuid4

now = datetime.now()
pages = {}
site_url = 'https://oakreef.ie'
date_format = "%Y-%m-%dT%H:%M:%SZ"

my_id = "ADMIN USER UUID"

with open('comments.csv', newline='') as csv_file:
		csv_reader = csv.reader(csv_file, delimiter=',', quotechar='"')
		for row in csv_reader:
				author, date, url, text = row
				date = parse(date)
				
				if url not in pages:
					pages[url] = {
						'comments': []
					}
				
				pages[url]['comments'].append({
					'author': author,
					'date': date,
					'text': text
				})


with open('basedata.json') as json_file:
		data = json.load(json_file)

domainId = data['pages'][0]['domainId']

for url, page in pages.items():
	page_id = str(uuid4())
	data['pages'].append({
		'createdTime': now.strftime(date_format),
		'domainId': domainId,
		'id': page_id,
		'isReadonly': False,
		'path': url,
	})

	for comment in page['comments']:
		comment_id = str(uuid4())
		data['comments'].append({
			"authorCountry": "IE",
			'authorName': comment['author'],
			'createdTime': comment['date'].strftime(date_format),
			"deletedTime": "0001-01-01T00:00:00.000Z",
      "editedTime": "0001-01-01T00:00:00.000Z",
      "html": f"\u003cp\u003e{comment['text']}\u003c/p\u003e\n",
			'id': comment_id,
			'isApproved': True,
			'isDeleted': False,
			'isPending': False,
			'isSticky': False,
			'markdown': comment['text'],
			"moderatedTime": comment['date'].strftime(date_format),
			'pageId': page_id,
			'score': 0,
			'url': f'{site_url}{url}#comentario-{comment_id}',
			'userCreated': '00000000-0000-0000-0000-000000000000',
			"userModerated": my_id
		})


with open('import.json', 'w') as import_file:
	json.dump(data, import_file)

              

When that was done I put it away for a while as I wasn’t feeling well and eventually came back to do replies. I, again, manually went through all replies I had made to comments on the old system and made a CSV file with the reply date, URL of the page, the UUID of the parent comment as it existed in the new Comentario system, the UUID of the page the parent comment is on in teh new Comentario system and the text of the reply.

Two things are important to note about this:

It was a pain in the hole. If I had done replies at the same time as the rest of the comments I could have used the UUIDs that I was generating in the script rather than going to find them manually and making them into a CSV.
The initial upload failed as apparently Comentario couldn’t match the page and user IDs to what was in the database and it needed those to be in the import file. I got around this by doing another export and copying the entries for pages and commenters from that into the new one and uploading. This was not a good way to do this! It could have gone badly or had unexpected side effects. Again, if you’re doing this do not import comments and replies as two separate steps!
It still didn’t fully work anyway. My replies did import and do show up on the right pages but they are not nested properly as replies. It’s like looking at a comment section on a very old Youtube video where reply chains are broken and everything just displays as individual comments. I don’t think that I am going to bother trying to fix this as I don’t have that many comments on this site and I think everything reads understandably as it is but if you want to try this approach you will want to figure out a way of not fucking up importing the replies.

                import csv
import json
from datetime import datetime, timezone
from dateutil.parser import parse
from pprint import pprint
from uuid import uuid4



now = datetime.now()
site_url = 'https://oakreef.ie'
date_format = "%Y-%m-%dT%H:%M:%SZ"

my_id = "ADMIN USER UUID"

data = {
  "version": 3,
  "comments": [],
}

with open('replies.csv', newline='') as csv_file:
		csv_reader = csv.reader(csv_file, delimiter=',', quotechar='"')
		for row in csv_reader:
				date, url, parent_id, page_id, text = row
				date = parse(date)

				comment_id = str(uuid4())

				data['comments'].append({
					"authorCountry": "IE",
					"createdTime": date.strftime(date_format),
					"deletedTime": "0001-01-01T00:00:00.000Z",
					"editedTime": "0001-01-01T00:00:00.000Z",
					"html": f"\u003cp\u003e{text}\u003c/p\u003e\n",
					"id": comment_id,
					"isApproved": True,
					"isDeleted": False,
					"isPending": False,
					"isSticky": False,
					"markdown": text,
					"moderatedTime": date.strftime(date_format),
					"pageId": page_id,
					"parentId": parent_id,
					"score": 0,
					"url": f'{site_url}{url}#comentario-{comment_id}',
					"userCreated": my_id,
					"userModerated": my_id
				})



with open('reply-import.json', 'w') as import_file:
	json.dump(data, import_file)

              

My avatar

One last thing is that Comentario doesn’t allow GIF avatars, but I like my sparkly Jupiter. After looking at the Postgres database I could see that user avatars are simply stored as binary data in the table cm_user_avatars with three sizes avatar_l, avatar_m and avatar_s corresponding to 128×128, 32×32 and 16×16 pixels, respectively, so I made some GIFs in the appropriate sizes, converted them to binary strings, and overrode the avatar_l and avatar_m entries in the cm_user_avatars table manually (I left the avatar_s as a JPEG).

                UPDATE cm_user_avatars SET avatar_m = '\xBINARY_DATA_HERE'  WHERE user_id = 'UUID_HERE';

              

This seems to work without any problems and my avatar in my own comments section is sparkly now.

Conclusions

That’s it I hope I don’t have to worry too much about this setup again for some time.

#meta #web dev