Allow a simplified sintax without some of the symbols #131

DiogenesBR · 2023-05-13T17:03:39Z

DiogenesBR
May 13, 2023

Today we are in the time of AI and generating code using AI is a Path without return, the number of tokens a language use is a edge on the long run.
Python already one of the cheaper languages to generate code.

But can be even cheaper if we remove useless characters:

This a sample code in mojo:

def softmax(lst):
  norm = np.exp(lst - np.max(lst))
  return norm / norm.sum()

struct NDArray:
  def max(self) -> NDArray:
    return self.pmap(SIMD.max)

struct SIMD[type: DType, width: Int]:
  def max(self, rhs: Self) -> Self:
    return (self >= rhs).select(self, rhs)

Estimate token = 114
https://gpttools.com/estimator

Here a simplified code without some of the symbols I got from ChatGPT:

def softmax(lst)
  norm np.exp(lst - np.max(lst))
  return norm / norm.sum()

struct NDArray
  def max() NDArray
    return self.pmap(SIMD.max)

struct SIMD[type DType, width Int]
  def max(rhs Self) Self
    return (self >= rhs).select(self, rhs)

Estimate tokens: 101

Around of 10% of economy in tokens, this on the long run could allow the LLMs the generate a 10% more complex code, review error in a function with 10% more code and so on.

I tried to iterate and reduce but I'm not really a Python developer, and I'm not the best person to say what symbols could be cut without the code lose utility.
I'm a C# developer thinking of migrate to mojo because looks more like C# than Python

What I'm saying is not about not support these symbols but make then optional

Any IDE can change the color of the parameters, even the parenthesis could be optimized.

lattner · 2023-05-14T09:35:40Z

lattner
May 14, 2023
Maintainer

Thank you for the suggestion. Right now we are bringing up the core language and type system, syntax optimizations are not the priority right now

0 replies

Moosems · 2023-05-25T16:01:26Z

Moosems
May 25, 2023

To be honest, it is nicer for AI token usage but it's much less clear what everything is and I much prefer using a few more tokens to be able to read my code and that which is created by an AI (even more so with the AI).

0 replies

DiogenesBR · 2023-05-25T17:40:13Z

DiogenesBR
May 25, 2023
Author

The problem is not the cost of using a few more tokens, is that the number of tokens a AI can process is limited, each aditional token is one less association and complexity of the code. Todays Open Ai token count limits open for the public are is 4000 to GPT 3.5 Turbo and 8000 for GPT-4, if you use a free LLM to generate code allowing you to keep a AI runing always on your server, looking for bugs on each PR for example, the limit in free LLMs is 2000 tokens, these tokens are split between question and answer, if um do a good and detailed prompt you lose in generated code.

Now think about two languages to choose:

one if you have to run a automated service is always more expencive, even if you want to pay the output logic always less detaled than the other, it's harder to build systems that improve thenselves. You alway have to do little things manually because the AI just cant do the fine ajustaments on scale.
The other is cheaper, the code always is a little better when you automate something, the language is better to run automated services that improve de code automaticaly. You can focus more on business logic than in doing improvements in your codebase.

After one or two years of compounded improvement of 2%, how much better would a software be if it uses fewer tokens? How would the entire ecosystem of that language be affected if all the software were 2% better?

0 replies

DiogenesBR · 2023-05-25T17:51:06Z

DiogenesBR
May 25, 2023
Author

The improvement don't even need to be on user level just suport of a simpler sintax, something that a IDE can translate back to the way you want. The Idea is to Finetune a AI using a simpler version, but even If I do It myself, the cascate effect on the environment of the language will not be as big if it comes from the creator.

0 replies

DayDun · 2023-05-26T17:21:48Z

DayDun
May 26, 2023

I don't think this should be in scope of the official language. If you want to design a syntax specifically for LLMs to train on, why not compress it even more? Doesn't need to be human readable, just LLM readable.

There are many different ways to you could define compression schemes for this, so I think it would be most appropriate as separate community projects.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow a simplified sintax without some of the symbols #131

{{title}}

Replies: 5 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Allow a simplified sintax without some of the symbols #131

DiogenesBR May 13, 2023

Replies: 5 comments

lattner May 14, 2023 Maintainer

Moosems May 25, 2023

DiogenesBR May 25, 2023 Author

DiogenesBR May 25, 2023 Author

DayDun May 26, 2023

DiogenesBR
May 13, 2023

lattner
May 14, 2023
Maintainer

Moosems
May 25, 2023

DiogenesBR
May 25, 2023
Author

DiogenesBR
May 25, 2023
Author

DayDun
May 26, 2023