Math Symbol Frequencies | Svelte Hacker News

omoikane 41 minutes ago

I wonder if these tables are telling us that it's more conventional to write "a < b" as opposed to "b > a". Is there a style guide for writing equations?

layer8 18 hours ago

It seems weird that ∋ would be the sixth-most frequent symbol, while ∈ doesn't figure at all.

seanhunter 21 minutes ago

There definitely is some sort of methodological problem. It thinks \otimes is more than 4 times more frequent than the plain good old fashioned integral sign. There’s absolutely no way that is the case.
mkl 17 hours ago

Agreed. Even stranger to me is @ as the fourth most common operator, supposedly more common than +. The whole thing seems dubious.
- yorwba 12 hours ago
  
  Its number of occurrences is 103,090. In the master's thesis identified as the original source https://cs.uwaterloo.ca/~smwatt/home/students/theses/CSo2005... the Unicode value of the operator occurring 103,090 times is given as 2061, and the thesis helpfully explains that
  Unicode 2061, 2062 and 2063 are invisible operators. TeX does not have any of these invisible operators. These invisible operators result from the TEX to MathML conversion.
  – 2061 – Function application
  – 2062 – Invisible times
  – 2063 – Invisible separator
  And Wikipedia says that function application may be represented as
  U+2061 FUNCTION APPLICATION (⁡, ⁡) — a contiguity operator indicating application of a function; that is an invisible zero width character intended to distinguish concatenation meaning function application from concatenation meaning multiplication. https://en.wikipedia.org/wiki/Function_application#Represent...
  I'm not sure though how an automated conversion process would be able to distinguish between these.
- dleeftink 14 hours ago
  
  The table byline says: "The @ symbol is used to encode mathematical formulas for the computer. It is not visible to the user."
- layer8 17 hours ago
  
  I would suspect that the @ comes from author email addresses. It's not entirely wrong to call that an operator. ;)
  - mkl 14 hours ago
    
    No, the data (as described in So's thesis) was mathematical expressions extracted from TeX source code, so the surrounding text and email addresses etc. were ignored. Skimming through by eye I can't see @ in any of So's tables, and searching for the hex Unicode value the tables list for every other character yields no hits: @ is not in the tables.
    ∋ is there anomalously frequently, and @ is missing, so something seems to have gone wrong, probably at multiple stages in the pipeline.
  - mmooss 16 hours ago
    
    Do papers tend to have more email addresses or more plus signs? I'd expect the latter, by a lot.

dleeftink 14 hours ago

A related report from way back, that counts expressions instead of symbols[0]. The counting procedure used in OP's referenced table might benefit from first extracting expressions, and then counting individual symbol frequencies.

[0]: Watt, S. M. A Preliminary Report on the Set of Symbols Occurring in Engineering Mathematics Texts. In Proceedings of MICA 2008: Milestones in Computer Algebra 2008.

VonTum 18 hours ago

I had a bit of a chuckle that apparently 5 out of 50000 opening "(" parentheses weren't closed, but then I saw that 2 out of 12000 "]" brackets weren't opened! What criminal is using these standalone?

gfaure 17 hours ago

There is the normal notation for half-open ranges, which would lead to unbalanced brackets.
- smcin 10 hours ago
  
  Ah. Good point.
devrandoom 13 hours ago

I hope you irony of your comment isn't lost.
rphln 18 hours ago

Mixing them should be relatively common when denoting intervals, as in "(a, b]" or "[a, b)", so that'd be one cause for being unbalanced. But even so, the math on their usage still doesn't add up.
xelxebar 14 hours ago

Probably not this, but J uses lonely brackets and braces as standalone operators: https://code.jsoftware.com/wiki/NuVoc.
orlp 17 hours ago

You won't like bra-ket notation then :)
jxjnskkzxxhx 4 hours ago

I mean.... You just used those standalone.