cross-posted from: https://lemmy.cafe/post/1403198
> #### Overview
>
> This is a quick write up of what I had spent a few weeks trying to work out.
>
> The adventure happened at the beginning of October, so don't blindly copy paste
> queries without making absolutely sure you're deleting the right stuff. Use
> `select` generously.
>
> When connected to the DB - run `\timing`. It prints the time taken to execute
> every query - a really nice thing to get a grasp when things take longer.
>
> I've had duplicates in `instance`, `person`, `site`, `community`, `post` and
> `received_activity`.
>
> The quick gist of this is the following:
>
> - Clean up
> - Reindex
> - Full vacuum
>
> I am now certain vacuuming is not, strictly speaking, necessary, but it makes me
> feel better to have all the steps I had taken written down.
>
> `\d` - list tables (look at it as `describe database`);
>
> `\d tablename` - describe table.
>
> `\o filename\ - save all output to a file on a filesystem. `/tmp/query.sql` was my choice.
>
> ___
>
> #### `instance`
>
> You need to turn `indexscan` and `bitmapscan` off to actually get the duplicates
> ``` sql
> SET enable_indexscan = off;
> SET enable_bitmapscan = off;
> ```
>
> The following selects the dupes
> ``` sql
> SELECT
> id,
> domain,
> published,
> updated
> FROM instance
> WHERE
> domain IN (
> SELECT
> domain
> FROM
> instance
> GROUP BY domain
> HAVING COUNT(*) > 1
> )
> ORDER BY domain;
> ```
>
> Deleting without using the index is incredibly slow - turn it back on:
> ``` sql
> SET enable_indexscan = on;
> SET enable_bitmapscan = on;
> ```
>
> ``` sql
> DELETE FROM instance WHERE id = ;
> ```
>
> Yes, you can build a fancier query to delete all the older/newer IDs at
> once. No, I do not recommend it. Delete one, confirm, repeat.
>
> At first I was deleting the newer IDs; then, after noticing the same instances
> were still getting new IDs I swapped to targetting the old ones. After noticing
> *the same god damn instances* **still** getting new duplicate IDs, I had to dig
> deeper and, by some sheer luck discovered that I need to `reindex` the database
> to bring it back to sanity.
>
> `Reindexing` the database takes a *very* long time - don't do that. Instead
> target the table - that should not take more than a few minutes. This, of
> course, all depends on the size of the table, but `instance` is naturally going
> to be small.
>
> ``` sql
> REINDEX TABLE instance;
> ```
>
> If `reindexing` succeeds - you have cleaned up the table. If not - it will yell
> at you with the first name that it fails on. Rinse and repeat until it's happy.
>
> Side note - it is *probably* enough to only `reindex` the index that's failing,
> but at this point I wanted to ensure *at least* the whole table is in a good
> state.
>
> ___
>
> Looking back - if I could redo it - I would delete the new IDs only, keeping the
> old ones. I have no evidence, but I think getting rid of the old IDs introduced
> more duplicates in other related tables down the line. At the time, of course,
> it was hard to tell WTF was going on and making a *wrong* decision was better
> than making *no* decision.
> ___
>
> #### `person`
>
> The idea is the same for all the tables with duplicates; however, I had to
> modify the queries a bit due to small differences.
>
> What I did at first, and you **shouldn't** do:
>
> ```sql
> SET enable_indexscan = off;
> SET enable_bitmapscan = off;
>
> DELETE FROM person
> WHERE
> id IN (
> SELECT id
> FROM (
> SELECT id, ROW_NUMBER() OVER (PARTITION BY actor_id ORDER BY id)
> AS row_num
> FROM person) t
> WHERE t.row_num > 1 limit 1);
> ```
>
> The issue with the above is that it, again, runs a `delete` without using the
> index. It is horrible, it is sad, it takes forever. Don't do this. Instead,
> split it into a `select` without the index and a `delete` with the index:
>
> ```sql
> SET enable_indexscan = off;
> SET enable_bitmapscan = off;
>
> SELECT
> id, actor_id, name
> FROM person a
> USING person b
> WHERE
> a.id > b.id
> AND
> a.actor_id = b.actor_id;
> ```
>
> ``` sql
> SET enable_indexscan = on;
> SET enable_bitmapscan = on;
>
> DELETE FROM person WHERE id = ;
> ```
>
> `person` had dupes *into the thousands* - I just didn't have enough time at that
> moment and started deleting them in batches:
>
> ``` sql
> DELETE FROM person WHERE id IN (1, 2, 3, ... 99);
> ```
>
> Again - yes, it can probably all be done in one go. I hadn't, and so I'm not
> writing it down that way. This is where I used `\o` to then manipulate the output to be in batches using coreutils. You can do that, you can make the database do it for you. I'm a better shell user than an SQL user.
>
> `Reindex` the table and we're good to go!
>
> ``` sql
> REINDEX table person;
> ```
>
> ___
>
> #### `site`, `community` and `post`
>
> Rinse and repeat, really. `\d tablename`, figure out which column is the one to
> use when looking for duplicates and `delete-reindex-move on`.
>
> ___
>
> #### `received_activity`
>
> This one deserves a special mention, as it had *64 million rows* in the database
> when I was looking at it. Scanning such a table takes *forever* and, upon closer
> inspection, I realised there's nothing useful in it. It is, essentially, a log
> file. I don't like useless shit in my database, so instead of trying to find the
> duplicates, I decided to simply wipe most of it in hopes the dupes would go with
> it. I did it in 1 million increments, which took ~30 seconds each run on the
> single threaded 2GB RAM VM the database is running on. The reason for this was
> to keep the site running as `lemmy` backend starts timing out otherwise and
> that's not great.
>
> Before deleting anything, though, have a look at how much storage your tables
> are taking up:
>
> ``` sql
> SELECT
> nspname AS "schema",
> pg_class.relname AS "table",
> pg_size_pretty(pg_total_relation_size(pg_class.oid)) AS "total_size",
> pg_size_pretty(pg_relation_size(pg_class.oid)) AS "data_size",
> pg_size_pretty(pg_indexes_size(pg_class.oid)) AS "index_size",
> pg_stat_user_tables.n_live_tup AS "rows",
> pg_size_pretty(
> pg_total_relation_size(pg_class.oid) /
> (pg_stat_user_tables.n_live_tup + 1)
> ) AS "total_row_size",
> pg_size_pretty(
> pg_relation_size(pg_class.oid) /
> (pg_stat_user_tables.n_live_tup + 1)
> ) AS "row_size"
> FROM
> pg_stat_user_tables
> JOIN
> pg_class
> ON
> pg_stat_user_tables.relid = pg_class.oid
> JOIN
> pg_catalog.pg_namespace AS ns
> ON
> pg_class.relnamespace = ns.oid
> ORDER BY
> pg_total_relation_size(pg_class.oid) DESC;
> ```
>
> Get the number of rows:
>
> ```sql
> SELECT COUNT(*) FORM received_activity;
> ```
>
> Delete the rows at your own pace. You can start with a small number to get the
> idea of how long it takes (remember `\timing`? ;) ).
>
> ``` sql
> DELETE FROM received_activity where id < 1000000;
> ```
>
> **Attention!** Do let the `autovacuum` finish after every delete query.
>
> I ended up leaving ~3 million rows, which at the time represented ~ 3 days of
> federation. I chose 3 days as that is the timeout before an instance is marked
> as dead if no activity comes from it.
>
> Now it's time to `reindex` the table:
>
> ``` sql
> REINDEX TABLE received_activity;
> ```
>
> Remember the reported size of the table? If you check your system, nothing will
> have changed - that is because postgres *does not release* freed up storage to
> the kernel. It makes sense under normal circumstances, but this situation is
> anything but.
>
> Clean all the things!
>
> ```sql
> VACUUM FULL received_activity;
> ```
>
> *Now* you have reclaimed all that wasted storage to be put to better use.
>
> In my case, the *database* (not the table) shrunk by ~52%!
>
> ___
>
> I am now running a cronjob that deletes rows from `received_activity` that are
> older than 3 days:
>
> ``` sql
> DELETE FROM
> received_activity
> WHERE
> published < NOW() - INTERVAL '3 days';
> ```
>
> In case you're wondering if it's safe deleting such logs from the database -
> Lemmy developers seem to agree
> [here](https://github.com/LemmyNet/lemmy/pull/4130) and
> [here](https://github.com/LemmyNet/lemmy/pull/4131).
Why, when I search for a community my instance is not yet aware of, does it sometimes not bring back any posts in that community? This community is in example of that.
Most new additions will redline the CPU but then come back with a populated page for the community. Sometimes that doesn't happen and no amount of purging and retrying will cause it to.
I'm currently unable to subscribe to anything on Beehaw. I just get subscription pending and there is no content sent my way. Statistically they're also more likely to encounter no. 1 above. Can I fix this?
I’m using the Lemmy ansible installation method. I’ve been trying to add sendgrid to the postfix section of the config.hjson file on my local machine. But where do I add the API key and username? I used port 587 but nothing works. Can anyone help walk me through how to integrate sendgrid into Lemmy-Ansible? Thanks!!
the email section of config.hjson looks like this, did I do this right?
```
email: {
smtp_server: "smtp.sendgrid.net:587"
smtp_from_address: "noreply@{{ domain }}"
tls_type: "tls"
}
```
I was able to find the server location on my VPS under srv/lemmy/domain, so I can edit the lemmy.hjson file there if need be.
I put the middle finger emoji under their newest piracy announcement and they banned me from Lemmy World for a week. I can argue that it was ambiguous. It could have been against Lemmy for letting pirates back in, or it could have been against the pirates. So damn quick to judge.
https://github.com/LemmyNet/lemmy/issues/2943#issuecomment-1581485335
https://github.com/LemmyNet/lemmy/pull/2055
```
curl -X PUT http://localhost:1236/api/v3/community/hide \
-H "Content-Type: application/json" \
-d \
'{"community_id":3,"hidden":true,"reason":"controversal","auth":"Foo"}'
```
I haven't tried this, but maybe someone will find it useful and test it out. You could probably also easily do it in the database instead of using the API call.
cross-posted from: https://lemmy.ml/post/4489142
> *Originally asked in [#lemmy:matrix.org](https://matrix.to/#/!WTFQLNRWYipcHbkgbO:matrix.org/$Igu_UvaoB00qnYdbnjn2kXiEQ6JK3lQsq-UK75tZ48I?via=matrix.org&via=envs.net&via=mozilla.org)*
>
> ---
>
> ## 1 The Idea
>
> I've been thinking about writing a website to monitor Lemmy instances, much in the same vein as lemmy-status.org, to help people like me, who are interested in the operational health of their favourite servers, have a better understanding of patterns and be notified when things go wrong.
>
> *I thought I'd share my thoughts w/ you and ask for your feedback before going down any potential rabbit hole.*
>
> #### 1.1 Public-facing monitoring solution external to a cluster
>
> I don't wish to add any more complexity to a Lemmy setup. Rather I'm thinking about a solution which is totally unknown to a Lemmy server **AND** is publicly available.
>
> I'm sure one could get quite a decent monitoring solution which is *internal* to the cluster using Prometheus+Grafana but that is not the aim of this.
>
> #### 1.2 A set of key endpoints
>
> In the past there've been situations where a particular server's web UI would be a 404 or 503 while the mobile clients kept happily working.
>
> I'd like to query a server for the following major functionalities (and the RTT rate):
>
> * web/mobile home feed
> * web/mobile create post/comment
> * web/mobile search
>
> #### 1.3 Presenting stats visually via graphs
>
> I'd like to be able to look at the results in a visual way, preferably as graphs.
>
> #### 1.4 History
>
> I think it'd be quite cool (and helpful?) to retain the history of monitoring data for a certain period of time to be able to do some basic meaningful query over the rates.
>
> #### 1.5 Notification
>
> I'd like to be able to receive some sort of a notification when my favourite instance becomes slow or becomes unavailable and when it comes back online or goes back to "normal."
>
> ## 2 Questions
>
> ❓ Are you folks aware if someone has already done something similar?
>
> ❓ I'm not very familiar w/ Rust (I wrote only a couple of small toy projects w/ it.) Where can I find a list of API endpoints a Lemmy server publicly exposes?
>
> ❓ If there's no such list, which endpoints do you think would work in my case?
I'm looking into migrating (part of) my ever-growing pictrs data to something cheaper than the VPS disk it's currently on. Ideally I'd like to use minio's [Object Tiering](https://min.io/docs/minio/linux/administration/object-management/object-lifecycle-management.html) to migrate object to/from cheaper storage.
Anybody using [Backblaze's cloud storage](https://www.backblaze.com/cloud-storage) product? What other options are out there and what potential pitfalls should I be aware of?
cross-posted from: https://campfyre.nickwebster.dev/post/89316
> I was getting close to hitting the end of my free object storage so there was time pressure involved haha.
>
> Seems to work but I haven't tested it too much. Currently running on my instance.
My pictrs volume got quite huge and I wanna delete pics cached from other instances. The thing is I'm not sure which pics are from my instance and which aren't, because they all have cryptic filenames. Anyone knows of a way to differentiate?
What does that mean? I currently cannot do *anything* on Lemmy World. Am I shadowbanned? I can read the feed but I cannot look at my notifications or profile nor can I upvote or downvote posts.
1: The cooldown period is way too long. I understand that it's way less than the five minutes of Reddit, but 30 seconds is glacial. It needs to be 10 or 15 seconds so that people can get their art pieces done faster without requiring an entire team to do it with them.
2: There needs to be more allowance for artistic expression. The human body in itself is art and it has been this way for centuries. All of a sudden open breasts are a crime? I thought that Lemmy was leftist and progressive. And I wasn't the only one whose phallus drawings got shamelessly censored by the supreme overlord of the event. What happened to equity in artistic expression? Also, the Fuck Cars logo was demolished because it had the middle finger in it. Like, hello, that's the logo? Grow up?
EDIT: This is not to say that political expression which harms minorities and the LGBTQIA+ community should be allowed.
I asked this as a comment on my previous post, but I still have some questions. 1: If ML stands for Mali and they're not from Mali, then why would they represent a foreign country? 2: Since it's not Mali, what does the ML stand for? If it's a pair of letters, it represents a country or stands for two words. Machine Learning? McCartney Lennon? Mega Lemmy?
cross-posted from: https://yiffit.net/post/868741
> This new version introduced a system so that your instance stops sending out content to other instances that are supposedly dead / offline.
>
> Unfortunately for some reason there's false positives. When I checked comparing the results from a curl request vs the information in our Lemmy database I found over 350+ false positives.
>
> In the DB there is a table called "instance" which has a column called "updated". If the date on that column is older than 3 days, your server will stop sending any content to those instances.
>
> For some reason I had entries that were dated as last being alive in July, while actually they were always up. If an entry is incorrect, you can fix it by manually using an update statement and adding today's date. If your instance is not too large you can safely update all entries to today's date and check if everything works as expected from then on any new content created on your instances.
>
> The dead instances won't have an impact unless your instance is larger and generates more content, thus it might be easier to simply update all entries and have Lemmy believe they're all alive if you start noticing wonky behavior and don't want to check one by one.
>
> If you don't know how to access the database run this command where domaincom is your instance domain without the dot.
>
> 1. `docker exec -it domaincom_postgres_1 busybox /bin/sh`
>
> 2. `psql -U `
>
> (The default user is 'lemmy')
> You could technically do this is one single step, but it's good to know the command to get shell access to the container itself if you didn't know how to.
>
> This should give you access to a postgres CLI interface.
> Use \c to connect, \dt to list tables and \d+ tablename to list table definition. You can also run SQL queries from there.
>
> Try with this query: `SELECT * from instance` to list all instances and their updated date.
>
> You can use other SQL queries to get better results or correct false positives. Just be careful with what you execute since there's no undo.
## Current State
One controversial topic within the admin community is [Tor](https://www.torproject.org/). Many malicious actors that want to harm an instance hide behind the tor network, which is why many instances block traffic originating from Tor. The most common approach is to block requests from [exit nodes](https://support.torproject.org/glossary/exit/), a list of which can be found [here](https://check.torproject.org/api/bulk). Tor blocking is a valid principle that every instance operator must decide for themself. I do not condemn anyone for doing so.
## Motivation for Tor
However, Tor is also a tool to use the Internet in an anonymous way, bypassing censorship or big firewalls. This means that there is a legitimate use case for the combination of Tor and Lemmy.
There is even an official [Lemmy documentation](https://join-lemmy.org/docs/administration/tor_hidden_service.html) on how to run a Lemmy instance as a hidden service.
## The Issue
There is, however, one significant issue at this point: Picture requests are leaking.
On the normal web, all requests go to `https://lemmy.tld/...`, including image requests that look like `https://lemmy.tld/pictures/image/...`. In Tor, you access `http://xyz.onion/`, but the image requests still use `https://lemmy.tld/pictures/image/...`. From a Tor perspective, this is not intended and defeats the purpose of a hidden service. Yes, you are still anonymous, but the traffic through the exit nodes is slow (traffic within the tor network is »faster«) and not even necessary in this case.
The reason for this problem is that the image links are stored in full length in the database. For example, an image has the id `1a2b3c4d` and is stored in the DB as `https://lemmy.tld/pictrs/imate/1a2b3c4d`. This leads to requests for images (of the same website you visit via tor) take the long route to the clear web.
## Proposed Fix
I have delved into the [lemm-ui](https://github.com/LemmyNet/lemmy-ui/) source code and **[developed a fix](https://github.com/mikelauer/lemmy-ui)** for this problem. Unfortunately, this is not a universal solution and only works for our [QuantemToast (de/en)](https://postit.quantentoast.de) instance. However, it is easy to customize it for your instance. Just change the domain name in `src/shared/utils/app/substitute-image-url.ts` and build your own Docker image. It works by replacing the instance domain with the onion domain for image URLs (and the favicon).
Perhaps someone is interested in developing a general solution, but until then, those of you who want a Tor instance or just a Tor mirror (our use case) might like to take a look at my solution.
Edit: Use at your own risk.
## Please Note
Be aware, that content from other instances might not be visiable due to mentioned Tor blocking. Furthermore federation is currently not supported for Tor instances. Federation traffic between instances is handled on the clear web.
If you just want a Tor mirror, you might want to consider using a [single onion service](https://blog.torproject.org/whats-new-tor-0298/) for better performance.
Edit: Changed fix link from commit to branch. Had to change something because of icon leak