Skip to main content

Command Palette

Search for a command to run...

My Mid-point project report

My mid-point project progress report with Outreachy.

Published
6 min read

Hi there,

I’m halfway into my internship, approximately six weeks, and it has been a full-blown learning experience. In my initial project timeline, I already had a clear idea of how my project would be and how it would function. So, I made a UI design:

For my first week, I envisioned clarifying expectations from my mentors, continuing the UI screens, seeking feedback from my mentors, and modifying the design of what I have already.

But I guess the design I had was enough for a start, so my mentors put together another timeline for me that was modified and in accordance to my project:

After going through this new timeline, I was like “Oh yeah! I definitely needed to have added this!” It was very important for me to have all the tools set up in my first week, because how else will I start the work?. I’m grateful to my mentors for walking me through the setup process in my first week. This included signing up to Wikimedia Developer’s tool, deploying a simple “Hello Word“ tool on Toolforge (Wikimedia’s web service for hosting platforms), GitLab (for pushing codes and sending merge requests. They definitely do a lot of other stuff but these are just the main ones;)), and setting up the project’s research page on Wikimedia (I especially enjoyed this one because it was my first edit to Wikimedia and my mentors made me appreciate it as they celebrated me). The timeline was definitely enough, and I completed it before the week ended.

In my initial timeline for my next week’s tasks, I planned to begin the back-end development after setting up Toolforge:

This was very in line with what my mentors had for me:

Well, only that I had to call the LiftWing API and not only the Wikipedia API. So, I began reading about the LiftWing API documentation in the second week. The Lift Wing API is used to extract Wikipedia’s article quality scores and quality signal metrics like the number of headings, images, categories, infoboxes, and other signals that bring Wikipedia articles up to standard. While looking into how to extract these signals, I realized that I needed to extract the articles’ Wikipedia IDs to be able to extract them. And to do this, I have to look into Wikipedia’s MediaWiki API. I started learning all the properties here, also in order to extract the Wikipedia ID, and wrote the functions for it in FAST API (Python). After connecting my endpoint, I tested a list of articles in the docs and got these results:

So here you can see the raw scores and the normalized scores. The normalized scores are what we are going to be using because it gives accurate results on how many features are supposed to be included in a short or long article, and it also considers the language of articles. Some languages are richer when more images are included, for example, compared to other languages. The potential needs were gotten from the normalized scores. “1“ is the highest number and perfect score for articles that have complete images, categories, headings, etc. So I benchmarked each score that was below 0.5 and displayed them as those that would need editing:

For my next weeks (week 3, 4, 5, and 6), my initial timeline was to work on the UI, i.e., to display a table showing the article titles, scores, recommended tasks, etc.

My mentor’s timeline were inline with this also:

Using HTML, CSS, and JS, here were my results:

In weeks 5 and 6, I worked on adding the categories feature and filter by geography. And based on some UI/UX principles, I cleaned up the topic names and allowed for multiple filtering selections. I did this by reading about the category property in MediaWiki documentation and learning how to extract articles that belong to a particular category. Then I wrote a function that returns a category based on the query and input search, so that users won’t be able to input a category that does not exist. Here is the results:

If I was starting the project over I would start by signing up to the tools and installing the needed Python packages first. Seeking approval to Toolforge took 2 days, so I would have done that early at the beginning so that I won’t take any more time, and just follow through with the rest of my timeline as they were in line with what my mentors had for me.

For the second half of the internship, I am hoping to build on the work that has already been completed. That is, making sure that all core backend and frontend features are finalized, tested, and fully documented on-wiki. The remaining weeks will focus on extending the tool’s usefulness through the addition of an on-wiki output feature. This would allow users to generate wikitext outputs, such as checklists or worklist templates, directly from the tool’s analysis. These outputs could then be copied and pasted into a user page or project page, making it easier for organizers and editors to turn analysis results into editing tasks.

During Week 10 (February 9–15), the focus will be on designing a simple progress dashboard concept that can track the number of articles analyzed by a user, the number of potential tasks identified, and, if feasible, allow users to manually mark tasks as completed. This feature would provide lightweight progress tracking without introducing complex user account requirements. In the same week, I will begin a research and feasibility study to detect missing or unhealthy links in the article wikitext. This may include checking missing internal links against a list of common misspellings and investigating the use of available APIs to identify dead or unreachable external domains.

In Week 11 (February 16–22), the focus will shift to more advanced features and research related to new editors. I will research existing tools or AI-based approaches for detecting biased or non-inclusive language and propose a simple, rule-based filter as a proof-of-concept to flag commonly known terms. In parallel, I will draft a section for the final report analyzing how the tool lowers barriers for new editors and proposing a method to track whether event participants who use the tool return to edit. This may include ideas such as a voluntary opt-in survey or a lightweight tagging mechanism that respects user privacy.

The final phase of the internship, covering Weeks 12 and 13, will focus on wrap-up activities, including documentation, final integration, and handover. In Week 12 (February 23–March 1), my primary objective will be to complete all project documentation. This will involve creating comprehensive on-wiki user guides and technical documentation, ensuring the codebase is clean, well-commented, and robustly deployed on Toolforge, and integrating any completed nice-to-have features.

In Week 13 (March 2–6), I will prepare and deliver a final presentation and demo for mentors and the wider community. I will also submit the final Outreachy report in the form of a Diff post, synthesizing the work completed during the internship, feedback received, and potential future directions for the project, particularly in relation to new editor retention. The final day will focus on ensuring the project is in a strong, maintainable state.

So let’s go on this journey together, shall we?