uid 1000 just another blog

Deleting old Gitea actions

The other day I decided to setup CI/CD for my self-hosted Gitea instance. Logically, my first idea was to use the new Gitea Actions feature introduced with Gitea 1.19 early last year, which is apparently based on nektos/act. As someone who has only set up and used only very few and rather simple CI pipelines on GitHub and GitLab, I found setting up the runners and pipelines very confusing, and workflows that were supposed to run on GitHub did not work properly. After an hour or so I gave up on the built-in CI/CD and decided to set up Drone CI, which I already set up once in the past.

While disabling the actions feature disabled all runners and pipeline runs, old actions where still saved in the database and commit log, and clicking the little red cross ❌ (or green checkmark ✅ in the case of a successful run) lead to a 404 error page. As someone who likes to keep their system clean, I hate stuff like this - so I went to search for a solution 🕵🏼‍♂️. However, there is no official documentation on this and it is currently not even possible to delete workflow runs.

The database

Gitea supports severel different databases and I know that I went with SQLite, as I am the only user and the instance is hosted on a mini PC with little memory, similar to Intel NUC. So I stopped the docker containers, opened the volume (located at /var/lib/docker/volumes/gitea_data/_data) and opened the SQLite database using the sqlite3 ./gitea/gitea.db command.

Listing all tables using the .tables command lists the following:

access                     oauth2_grant
access_token               org_user
action                     package
action_artifact            package_blob
action_run                 package_blob_upload
action_run_index           package_cleanup_rule
action_run_job             package_file
action_runner              package_property
action_runner_token        package_version
action_schedule            project
action_schedule_spec       project_board
action_task                project_issue
action_task_output         protected_branch
action_task_step           protected_tag
action_tasks_version       public_key
action_variable            pull_auto_merge
app_state                  pull_request
attachment                 push_mirror
auth_token                 reaction
badge                      release
branch                     renamed_branch
collaboration              repo_archiver
comment                    repo_indexer_status
commit_status              repo_redirect
commit_status_index        repo_topic
commit_status_summary      repo_transfer
dbfs_data                  repo_unit
dbfs_meta                  repository
deploy_key                 review
email_address              review_state
email_hash                 secret
external_login_user        session
follow                     star
gpg_key                    stopwatch
gpg_key_import             system_setting
hook_task                  task
issue                      team
issue_assignees            team_invite
issue_content_history      team_repo
issue_dependency           team_unit
issue_index                team_user
issue_label                topic
issue_user                 tracked_time
issue_watch                two_factor
label                      upload
language_stat              user
lfs_lock                   user_badge
lfs_meta_object            user_blocking
login_source               user_open_id
milestone                  user_redirect
mirror                     user_setting
notice                     version
notification               watch
oauth2_application         webauthn_credential
oauth2_authorization_code  webhook

That are a lot of tables. However, action and some other tables starting with action_ look very promising. Let's take a look at the action table:

sqlite> SELECT COUNT(*) FROM action;
427041

That are more rows than I expected, to be honest. Let's view the table columns:

sqlite> PRAGMA table_info(action);
0|id|INTEGER|1||1
1|user_id|INTEGER|0||0
2|op_type|INTEGER|0||0
3|act_user_id|INTEGER|0||0
4|repo_id|INTEGER|0||0
5|comment_id|INTEGER|0||0
6|is_deleted|INTEGER|1|0|0
7|ref_name|TEXT|0||0
8|is_private|INTEGER|1|0|0
9|content|TEXT|0||0
10|created_unix|INTEGER|0||0

As I am not exactly familiar how Gitea works internally, this is not really helpful to me. act_user_id could have something to do with the Act Runner, but I am not really sure. However, just for a single repo of mine (which has 2 failed pipline runs), SELECT COUNT(*) FROM action WHERE repo_id = 166; counts 20 rows.

It would really helpful to see the actual Go code interacting with this table or a model struct, so let's clone the source code and search for interesing files.

gitea/models/activities/action.go looks interesting, as it could be a model representing the database. Indeed, the file contains the following struct:

type Action struct {
    ID          int64 `xorm:"pk autoincr"`
    UserID      int64 `xorm:"INDEX"` // Receiver user id.
    OpType      ActionType
    ActUserID   int64            // Action user id.
    ActUser     *user_model.User `xorm:"-"`
    RepoID      int64
    Repo        *repo_model.Repository `xorm:"-"`
    CommentID   int64                  `xorm:"INDEX"`
    Comment     *issues_model.Comment  `xorm:"-"`
    Issue       *issues_model.Issue    `xorm:"-"` // get the issue id from content
    [...]
}

And the following enum:

const (
    ActionCreateRepo                ActionType = iota + 1 // 1
    ActionRenameRepo                                      // 2
    ActionStarRepo                                        // 3
    ActionWatchRepo                                       // 4
    ActionCommitRepo                                      // 5
    ActionCreateIssue                                     // 6
    [...]
)

So this table apparently has nothing to do with the relatively new Gitea Actions feature, but rather logs all sort of stuff happening on the server, like new repositories being created or starred.

The other ones look a bit more promising, for example the action_run:

sqlite> PRAGMA table_info(action_run);
0|id|INTEGER|1||1
1|title|TEXT|0||0
2|repo_id|INTEGER|0||0
3|owner_id|INTEGER|0||0
4|workflow_id|TEXT|0||0
[...]
8|commit_sha|TEXT|0||0
9|event|TEXT|0||0
10|is_fork_pull_request|INTEGER|0||0
11|event_payload|TEXT|0||0
12|status|INTEGER|0||0
13|started|INTEGER|0||0
14|stopped|INTEGER|0||0
[...]

Getting all action runs for my repo of interest confirms this:

sqlite> SELECT commit_sha FROM action_run WHERE repo_id = 166;
3b75386c83d7bc9d89721615bbc1f79a0c7f7b24

This is the commit hash of the only commit in this repository which has a workflow run associated with it. Additionally, Drone CI workflows appear not be be listed here so, having an up-to-date backup of the database, I ran DELETE FROM action_run to delete all rows in this table. Similarly, I deleted all rows from action_run_job, action_task, action_runner (which still included my soft-deleted runners that I previously removed using the Admin UI), action_runner_token, action_task_step and action_tasks_version.

Great, no more runs in the database must mean these failed workflows should no longer appear when viewing the commit history of repositories - or so I thought.

Digging deeper: Templates

Turns out there were still these annoying crosses ❌ when viewing the commit history of repository 166, even though I deleted all these runs associated with the repo and exact commit. How can this be?

From looking through the source code, I know that Gitea uses the html/template package to render its UI. So I traced the exact location where these crosses and checkmarks are rendered from

When going back to the output of the .tables command, we can see that we have multiple tables that could be related to this commit status template (commit_status, commit_status_index, commit_status_summary). When listing all columns using PRAGMA table_info, we can confirm this suspicion. Turns out the CI/CD status of each commit is stored in this table and not recalculated every time the commits are listed - which makes sense.

However, when listing the table contents, we can see that this not only contains the status for Gitea Actions, but also for Drone CI pipeline runs - which also makes sense, since we can see this status as well in the UI:

sqlite> SELECT * FROM commit_status WHERE repo_id = 166;
[...]
1082|4|166|failure|[...]|/<username>/<repo-name>/actions/runs/1/jobs/0|Failing after 54s|[...]|<job-name> / build (push)|-2|1726775421|1726775421
1083|5|166|pending|[...]|/<username>/<repo-name>/actions/runs/1/jobs/0|Waiting to run|[...]|<job-name> / build (push)|-2|1726792076|1726792076
1084|6|166|failure|[...]|/<username>/<repo-name>/actions/runs/1/jobs/0|Failing after 53s|[...]|<job-name> / build (push)|-2|1726792131|1726792131
1085|1|166|pending|[...]|https://drone.<hostname>/<username>/<repo-name>/1|Build is pending|[...]|continuous-integration/drone|1|1726881188|1726881188
[...]

This also applies to the tables commit_status_index and commit_status_summary.

Cleaning up

As Gitea Actions uses relative URLs starting with a slash, I decided to delete everying from those tables where the target URL does not start with http:

sqlite> DELETE FROM commit_status WHERE target_url NOT LIKE 'https%';
sqlite> DELETE FROM commit_status_summary WHERE target_url NOT LIKE 'https%';

This leaves us with only Drone CI runs. As the commit_status_index table does not have a target URL column, I manually deleted all runs not matching the only commit, which Drone ran the pipeline for:

sqlite> DELETE FROM commit_status_index WHERE sha != '4b75968858ac843341fc76bd5383f6ddd966b630';

When starting the Gitea server again and viewing the commit history, all failed pipelines and dead links are gone.

Conclusion

While this is certainly a niche problem and most of the very few people who come across this issue will probably not mind the past workflow runs in the UI, I think this really nicely shows the advantages of open-source software. If Gitea were closed-source, it would be very difficult and time-consuming to trace down the root cause of this issue and fix it yourself.


home