It turns out this blog is not dead, it was only sleeping! You see, my mind works in a somewhat obsessive way. Whenever I find something I enjoy, I tend to focus all my being into it, and it becomes my main and only focus of attention. For a period of time, I live exclusively towards that goal, leaving everything else aside. Eventually I burn out, say "fuck it" and leave it for something else. I am aware that this behavior is somewhat pathological, even though I get extremely productive when I stay in the zone. Nevertheless I try to make some effort to create balance in my life, with greater or lesser success. In fact, this blog was created out of the intention to subject myself to some sort of posting schedule discipline, which I managed to maintain... for a few days. But here we are again!
Leaving my self-diagnosed psychological evaluation aside, the actual point of this post is to answer a request from a fellow Mastodoner asking how UnconsciousBot, one of my latest digital minions, works. This Mastodon bot procedurally generates text based on Jung's writings, works that by themselves are full of symbolic synchronicity potential, but even more so when we apply some randomness! As everything we have previously seen here, the results look more complex than the implementation. The human race is so prolific that it is increasingly difficult to create something that does not already exist, and in this case too it was just a matter of finding the proper library. Thanks to ddycai, we can use his random sentence generator to create random sentences based on an input file using Markov chains. This library acts as a handy wrapper for the Natural Language Toolkit, an interesting framework designed to work with human language data.
So how does it work? In order to generate text, we feed the algorithm with a text file composed of sentences that are tokenized. Then a random sentence is selected, and the first group of words are searched across the text file for coincidences. The sentence is then joined with another random sentence that matches those words, and the process is repeated till a dot is found. In this way, the bigger the group of words, the more strict the results will be. So for instance, many matches can be found if the group consists of two words, while a group of five words will mostly return literal sentences from the text. For UnconsciousBot I only used Aion as the text source, however the bigger input, the more possibilities. At some point I would like to improve it by adding more of Jung's books, but even as it is now, it tends to give interesting results from time to time. There are also some spacing issues in the original text file that I should fix at some point. Maybe tomorrow.
Here is the code, which is an adapted version of ddycai's script; it can also be found in my Github:
import requests, json, os from mastodon import Mastodon import nltk.data from nltk import word_tokenize from sentence_generator import Generator # Mastodon token and domain mastodon = Mastodon( access_token = 'asdf', api_base_url = 'https://domain.com' ) # How many words we are taking (the bigger the variable, the more strict it will be) words = 2 text = "" found = False # We open the file and apply some tokenizing magic while(found == False): with open("text.txt", 'r',encoding='utf-8') as f: sent_detector = nltk.data.load('tokenizers/punkt/english.pickle') sents = sent_detector.tokenize(f.read().strip()) sent_tokens = [word_tokenize(sent.replace('\n', ' ').lower()) for sent in sents] generator = Generator(sent_tokens, words) # We capitalize the first word to make it pretty text = generator.generate().capitalize() # We only accept the result if it is smaller than Mastodon's character limit if(len(text) < 500): found = True # Send result to Mastodon mastodon.status_post(text)
I hope this entry was somewhat useful! If not then I guess I failed miserably. This was post #11 in the #100DaysToOffload challenge. At this point, my personal goal with the challenge is to reach a hundred posts, no matter how long it takes, since I am more fond of quality over quantity. As always, thank you for reading and see you next time.