I remember doing javascript thinking "thank god for textarea element, I could never reimplement this, but it turns out, it's really not too bad.

@neauoire Looking forward to your Arabic-compatible text editor :)

@cancel it wouldn't be hard to remix this into a RTL editor

@neauoire that would be step 1. Then you would need to support both LTR and RTL at the same time. Then you need to figure out how to render the actual Arabic script :D

@cancel I'd have to first learn arabic or hebrew.

I wonder if arabic script would fit in a char?

@neauoire nope. It doesn't even have individual characters when it comes to drawing them, as far as I know.

@cancel ah yeah that's true the diacritics and stuff are added above the characters. It's really not something I'd feel confortable tackling haha.

@neauoire same. I think it would take me years of study.

@chirrolafupa @cancel @neauoire lol I wrote and deleted a response to your original post about the text editor with this exact thought. getting non-Latin scripts right in the general case is in the same category of difficulty as getting cryptography right, effectively impossible for an individual. you need to use a library like harfbuzz, and then there is considerable work on top of that. the end result will not be small or simple, unfortunately, because human writing is not small or simple...

@chirrolafupa @cancel @neauoire that's all true within the current paradigm of digital text rendering, which is ultimately derived from movable type theory, and that alone disadvantages cursive languages (ask me some time about the history of the printing press in the Middle East). there's nothing preventing a radically different approach to text rendering and fonts, however! things might be different if taken from a different perspective.

@nasser I'd love to hear more. Both about the history angle and about alternative approaches.

@chirrolafupa @cancel @neauoire

@akkartik @chirrolafupa @cancel @neauoire so much to say! the first printed arabic text was a quran made in venice in the 15th century, the idea being that they would sell it to muslim people as a product. they pulled it off, but the result was so catastrophically ugly that no one wanted it lol. attached images are the venetian quran and an arabic quran from the same time period. I don't know how obvious the differences are to someone who doesn't read arabic, but trust me it sucks.

@akkartik @chirrolafupa @cancel @neauoire this moment alone predicts how and why computers built in the West will fumble when it comes to Arabic. The printers had to figure out a way to represent Arabic letters as blocks and then fit the blocks together, because that's how the technology of the printing press worked. That's completely counter to Arabic as a language! you don't think about individual letters when you are writing Arabic, you think about whole words, because letters can bend/change

Follow

@akkartik @chirrolafupa @cancel @neauoire it wasn't until the 18th and 19th century that you started to get competently printed Arabic text, mostly in Cairo and Beirut. they did this by carving many many variations for each letter to accommodate the different ways it might bend and shape itself in a word, in addition to carving whole words into blocks. the result is that you have hundreds upon hundreds of blocks, as opposed to a single block for each letter.

@akkartik @chirrolafupa @cancel @neauoire this is a hack of course and like all hacks it has its limitations. once the typewriter comes around, you kind of do have to pick a single image for each letter. in movable type it's inconvenient to have hundreds and hundreds of blocks, but in a typewriter it's just impossible. as I understand this is when arabic as a written language gets most of its nuance and flourish flattened out of it to accommodate western tech

@akkartik @chirrolafupa @cancel @neauoire so the combination of movable type (letters have a finite number of visual presentations, words are constructed by arranging rectilinear images of letter in two dimensional space) and typewriters (every letter is connected to a button) are fundamentally incompatible with arabic as a writing system, and arabic has had to change to accommodate them, and is still not very well supported as a result. things only get worse with computers 😬 </history>

@nasser thanks for sharing! i love to revisit and challenge the "most basic assumptions" that i have as a "son of the west"
@akkartik @cancel @neauoire

@chirrolafupa @akkartik @cancel @neauoire i think we all grow by challenging the assumptions we were raised with!

@nasser @akkartik @chirrolafupa @cancel I'm convinced, not gonna go forward with the project, too much a minefield.

@neauoire @akkartik @chirrolafupa @cancel i dont mean to be a bummer... fwiw i think systems that acknowledge their limitations (e.g. 'this system works for the latin alphabet only') are fine. its the half-solutions that are a problem.

@neauoire @nasser @akkartik @chirrolafupa @cancel one question I meant to ask during FennelConf when I presented my structural editor was that while I know the love2d UI would be unable to support Arabic, it seems like terminals exist that do an OK job of it. is it easier for terminal/curses-based programs to support Arabic text as long as they're running inside a harfbuzz-enabled terminal, or are there still a lot of minefields there?

@technomancy @neauoire @akkartik @chirrolafupa @cancel still minefields, particularly because 1) mixing different scripts is quite hard and 2) terminal standards make assumptions about how text works that you have to bend to in order to be compliant. the terminal i am using (kitty) drops the ball.

@nasser @neauoire @akkartik @chirrolafupa @cancel for instance, here's the line-based UI for my editor running inside mlterm showing some code from the قلب readme. if the program did not mix scripts, would it have a decent chance at working correctly without any application-level changes to support Arabic?

@nasser Absolutely fascinating. So much to wrap my mind around.

Are the glyphs for all the different letter forms available online in some form?

@chirrolafupa @cancel @neauoire

@akkartik @chirrolafupa @cancel @neauoire not for Arabic in general, because the letter forms vary pretty significantly from calligraphic style to calligraphic style. certain styles are better documented online than others, and for the more ornate ones you just need to study with an expert to be able to write them (and they are quite hard to read as well). I learned the handwriting style I know (ruqaa) from a book and square kufic from this calligrapher sakkal.com/instrctn/sq_kufi_al

@nasser Does it make sense to try to separate calligraphic style from the picture? The analogy I'm thinking of is that I might hand-write the (arabic!) numeral 7 with a stroke in the middle, but I don't think about whether my computer does the same. Not sure if that's quite what you mean by 'calligraphic style'. Please tell me if I'm misunderstanding. And I don't mean to exclude all considerations of shape.

@chirrolafupa @cancel @neauoire

@akkartik @chirrolafupa @cancel @neauoire yeah calligraphic style roughly (very roughly) maps to "font" if thats helpful, so its kind of like presence/absence of the dash in the 7. digital text *has* basically disregarded all but the bare minimal letter forms to produce legible arabic, so what youre describing is the status quo.

@nasser @akkartik @chirrolafupa @cancel @neauoire do you know of any natively arabic text rendering implementations or attempts?

To what extent could this be addressed by using a lot of different ligatures (even ones that include entire words)?

@rra @akkartik @chirrolafupa @cancel @neauoire (if anyone wants to be left off of these replies please lmk) the approach of piles of ligatures is basically what modern systems do. opentype supports ligatures and arabic fonts go to town on them, even for full words in the cases of really nice fonts. re: native text rendering, i've never been able to find anything that was not a hack on top of existing systems, usually western or japanese.

@rra @akkartik @chirrolafupa @cancel @neauoire the earliest evidence of arabic text i can find are the Kuwaiti Sakhr computers from ~1985, but they are custom firmware for MSX chips, and use hacks on top of the existing MSX font tech as i understand it, so not quite native. (i managed to get one of these on ebay and i have a cartridge for an arabic logo programming language!)

@nasser thanks for sharing that entire thread, super interesting and informative

@rra @nasser
Thank you this is so fascinating!

I don't speak Arabic but I made some progress in learning how to read and write it so that I could navigate Arabic music recordings (for example on YouTube). I've studied languages before but I've never encountered a writing system like this. I couldn't have even imagined what I didn't know!

@nasser Right. I want to improve the status quo -- but also not boil the ocean. The problem coming into focus for me: render text more fluidly and with shape, but supporting just a single style/font, so that it looks non-sucky to all readers of Arabic.

Still a hard problem. Does this seem worth attempting?

Perhaps print-text() should always accept and return a bounding box. And sometimes error: "you can't print x followed by y, you must print xy together."

@chirrolafupa @cancel @neauoire

@nasser @akkartik @chirrolafupa @cancel @neauoire the takeaway I'm getting from this is that Arabic words are kind of like Chinese characters even though they have an alphabet like Western languages.

Is that off the mark?

@waterbear @akkartik @chirrolafupa @cancel @neauoire not quite. arabic is an alphabet (technically an "abjad") with a relatively average number of letters that have a large number of visual variations because the script is cursive. chinese as I understand has a large number of characters to begin with, and they don't have required visual variations and the script is not cursive. they're both poorly served by computers though, just for different reasons 🙃

@nasser @waterbear @neauoire @cancel @chirrolafupa @akkartik Chinese is pretty well served by computers once you've decided on an encoding. It's even easier to lay out than the usual alphabets, as all characters are square.

Standing offer: I would love to collaborate on a computing stack for a non-English language by forking github.com/akkartik/mu. It's very barebones and not afraid of radical experiments. It already has an API for rendering arbitrary utf-8 strings and returning arbitrarily-sized bounding boxes for what was rendered. So the work is mostly creating glyphs for combinations of codepoints. And rules for segmenting. And lots of testing.

@nasser @waterbear @chirrolafupa @cancel @neauoire

@nasser @akkartik @chirrolafupa @cancel @neauoire @galaxis This is the same principle behind why while the Chinese invented block printing the Koreans really ran with and perfected it.

Sign in to participate in the conversation
Merveilles

Merveilles is a community project aimed at the establishment of new ways of speaking, seeing and organizing information — A culture that seeks augmentation through the arts of engineering and design. A warm welcome to any like-minded people who feel these ideals resonate with them.