Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocessor - expose complete parser? #165

Open
the-moog opened this issue Dec 24, 2021 · 12 comments
Open

Preprocessor - expose complete parser? #165

the-moog opened this issue Dec 24, 2021 · 12 comments

Comments

@the-moog
Copy link

I can see that the way hdlConverter works is that it first passes the code through a preprocessor.
Then produces and AST from the resulting text.

This preprocessor 'flattens' the code, replacing defines and ifdef/else blocks, etc.

I am finding a problem with this as I need to know what is inside those backtick blocks.

My use case is more complicated than this, but I can give a sort of practical example.
e.g.

I want to merge two code blocks programmatically.

Both code blocks include the same files.

I want to move that include to the top of the generated file so that it's not included twice as that would cause redefinition.

But in one file that include may (or may not) be inside an undefined `ifdef, in that case it should not be moved.
This might be because depending on that ifdef a different include is made.

So I need to know where the `ifdef starts and ends so that I can resolve the ifdef, but that parsing operation is not exposed to the user, it's transparent.

Is there a way to see the AST that the preprocessor uses, before flattening, rather than the output from hdlAst?

@Thomasb81
Copy link
Contributor

There is no python API to handle preprocessor stuff.
As for C or C++, verilog standard suggests to manage preprocessor directive before building model throught compilation and elaboration phase.

Your usecase is not clear to me.

I would suggest to reuse directly the antlr grammar to handle verilog and systemVerilog preprocessor (https://github.com/Nic30/hdlConvertor/blob/master/grammars/verilogPreprocLexer.g4, https://github.com/Nic30/hdlConvertor/blob/master/grammars/verilogPreprocParser.g4) to access an AST.

Thomas

@Nic30
Copy link
Owner

Nic30 commented Jan 2, 2022

@the-moog

I do have some notes which you may find useful:

  1. Preprocessor works on text level, you can not access or parse the code blocks before all preprocessor tasks are complete,
    because the code may be incomplete and thus is generally not a valid Verilog. Verilog parser can work only on preprocessed code.
  2. You can override preprocessor actions on C++ level, or use just antrl grammar and build your preprocessor.
  3. You can pass any python function as a preprocessor macro using preproc_macro_db (from your explanation this seems to me as a thing you want)
  4. Just merge of multiple files without replication can be achieved just by header guards.
  5. You can run preprocessor from hdlConvertor and then run you preprocessor.

@the-moog
Copy link
Author

the-moog commented Jan 7, 2022

Thanks for the extra info. I am sorry to be vague about the use case, I am writing python to generate synthisable SV code from a menu of components. Each component needs to be checked for syntax then the final generated output checked. The issue is ifdefs and includes which need careful attention. (expecially includes inside ifdefs) Multiple identical includes need optimising.

So I am really only operating in the preprocessor. I would rather avoid C++ to prevent scope creep on the project.

@Thomasb81
Copy link
Contributor

Usually user don't like to read generated code. It end up to be very repetitive, with no so interesting comments inside.

Base of this observation you could just print the code and avoid as much as possible usage of preprocessor directive.
Often usage of preprocessor induce a code complexity difficult to understand. Accumulation of practice code generation + generation of preprocessor directive should be avoid as much as possible... keep in mind that this source end-up to be read and transform by an other program in order to be use.

Code beauty is a human personal satisfaction but efficiency most of time is what pay-off at the end of the day.
Also some time code size matter. But playing with include is not the solution.

@the-moog
Copy link
Author

Hi,
Having been distracted by other work for a bit I am back on this project.
I have tried to use the ANTLR preprocessor grammar files from hdlConverter to generate a Python SV preprocessor, but now I realised that they are very much tied to C++. I know nothing about ANTLR so don't know how much is to be done to move forward and where to direct effort.
So how to I proceed? SWIG the library? Use the Cython verilogPreprocessor.pyx interfaces?
Give up and (go back to) use my own AST or PEG parser? (which is what I tried at first !!)

@Nic30
Copy link
Owner

Nic30 commented Mar 31, 2022

that they are very much tied to C++.

Only real dependency are the definitions of enums for language version.
https://github.com/Nic30/hdlConvertor/blob/master/grammars/verilogPreprocParser.g4#L7

Things like https://github.com/Nic30/hdlConvertor/blob/master/grammars/verilogPreprocLexer.g4#L9
can be translated to any language 1:1.

So how to I proceed? SWIG the library?
I do not recommend because it is unnecessary and complicated.

Can you please post some example of your input and output code and mark what you need to detect/translate?
HdlConvertor allows you to inject preprocessor with python functions and preprocess the code in them.
However if I remember correctly you mentioned something you need to parse the code in ifdefs blocks and it is impossible in general case because the content of ifdefs may not be in valid format at all (In simple cases of course it is however).

So please provide some examples so I can give a better advice.

@the-moog
Copy link
Author

the-moog commented Apr 1, 2022

Our company is very pedantic about sharing code IP, even fragments. I would probably lose my job if I were to share even a tiny piece of code here. It goes so far as to scanning public repos for possible leaks.

But I can paraphrase it....

Generally it is module instantiation of pre-built IP blocks, e.g.

///  higher level file

`resetall
`timescale 1ns/10ps

`include "define_stuff"
<<== Other includes as required
<<== Collected includes deduplicated here

/*
This module is generated by a code generator.  The generator is controlled by a build database.
We have a menu of dozens of code blocks, each module tested every day by Jenkins.
The generator knows about module interoperability and supplies a list of valid combinations.
Any given FPGA RTL is built from a subset of those modules and tested as an integrated block.
The test bench is automatically generated to match the top level build in the same way as this.
as well as those modules being include / excluded by the processing system.
*/

// <<==  Means generated code possibly inserted here after parser has processed this module.
// RTL and testbench code is generated from human written HDL like templates.

module outer #(
		parameter	PARAM1 = 1,
                    PARAM2 = 3
	//  <<== Additional params as required
	)	(
		input IN_PINS,
		output OUT_PINS,
		some_interface.controller  interface_instance,
`ifdef SOME_BUILD_PARAM
	// BUILD_SPECIFIC code
	// <<== Other build specific pins as required
'endif
	// <<== Other pins as required (perhaps with ifdefs)
);
	
	`include "RTL_stuff"
// <<== Other includes as required, e.g. interface definitions
	
	interface_def  inst_interface1();
	
	
// <<== Other interfaces as required


SOME_STATIC_MODULE  static_module_inst1
	(
		.MODULE SIGNAL1	(INSTANCE_SIGNAL1),
		.MODULE SIGNAL2	(INSTANCE_SIGNAL2),
		// <<== Other pin connections as required
	);

// Static assignments
assign INSTANCE_SIGNAL1 = IN_PINS;
assign OUT_PINS = INSTANCE_SIGNAL2;

// <<== Other module instances as required, generated from templates.

// <<== Module assignments as required by generator


endmodule  //end of outer



//////  lower level file

// NOTE:  Dynamic module templates

// Pins, parameters and preprocessor symbols required to insert the module are detected by the generator
// Code between braces {like_this} is completed by the generator before insertion
// Code validity is verified by the parser to compare declared pins (after rendering template)
// with those pins present in the higher level module instantiation.

`include "required_include"


module xyz_wrapper #(
		parameter staic_params = static_value,
		{template_params = template_value}
	)	(
		input sub_module_pin1,
		output sub_module_pin2,
		{template_pins)
	);
	
	xyz_module wrapped_xyz (
		.pin1	(sub_module_pin1),
		.pin2	(sub_module_pin2)
	);
	
`ifdef MODULE_INCLUDED
	// do that
`else
	// do something else
`endif
	
endmodule
	

So here the parser must detect parameters, pins and preprocessor symbols that must be defined should the sub-module be included in the build.
If some required symbol is missing then the generator throws an error as the database entry is wrong.
First it renders the tempate to produce valid code and turn calculated symbols in the template into a valid constant. (e.g. an entry in an address table)
It needs to check interface pins are correctly present in the higher module.
And check that the sub-module code is syntatically correct for a module after rendering.
As sub module imports may include files already included, so those need collecting and moving to the top.

@the-moog
Copy link
Author

the-moog commented Apr 4, 2022

Can you point me to any documentation that explains the code structure?

I am trying to work out what direction to go to achieve the above task. I have half a mind to return to my attempts using PEG, other other to work out how to produce a similar wrapper for the hdlConverter preprocessor.

I think I'm strugging to understand what code is machine generated (e.g. from ANTLR) what is boilerplate and what is original code.

My understanding, correct me where I am wrong.
cmake is used to bootstrap the build environment
cmake is used to generate a Makefile in ./build
The lexer and parser in ./grammars are written in ANTLR, which spits out C++ code under cmake
This is merged with your C++ code to make libHdlConverter.so using cmake
Some of the generated C++ code is templated from the ANTLR files.
The tools svConverter and vhdlConverter are completely separate and use the AST to translate code.
setup.py and setup.cfg wraps libHdlConverter in Cython with some help from hdlConverterAst (a related project by yourself)
Users call the Cython wrapper from Python

@Nic30
Copy link
Owner

Nic30 commented Apr 4, 2022

cmake is used to bootstrap the build environment
cmake is used to generate a Makefile in ./build

Yes, cmake is "makefile generator" but you can also use ninja-build or visual studio project etc. as an output from cmake

The lexer and parser in ./grammars are written in ANTLR, which spits out C++ code under cmake

This is merged with your C++ code to make libHdlConverter.so using cmake

Yes it is linked there https://github.com/Nic30/hdlConvertor/blob/master/src/CMakeLists.txt#L55

Some of the generated C++ code is templated from the ANTLR files.

All generated code is from antlr which translates .g4 files or from cython which transpiles the .pyx files

The tools svConverter and vhdlConverter are completely separate and use the AST to translate code.

These are internal libraries which are linked together to create hdlConvertor.so
Yes they are build separately and they are independent. They are translating raw SV/VHDL AST to hdlConvertor AST (hdlAst/*)

setup.py and setup.cfg wraps libHdlConverter in Cython with some help from hdlConverterAst (a related project by yourself)

setup.py is common python package installation script, this one is based on https://scikit-build.readthedocs.io/en/latest/usage.html#example-of-setup-py-cmakelists-txt-and-pyproject-toml

  • libHdlConverter is a c build part of library
  • hdlConvertorAst is python part with python AST objects and additional transformations and reverse translations
    it was extracted from hdlConvertorAst to remove c++ dependencies from project which were using just hdlConvertor AST to generate the code

Users call the Cython wrapper from Python

Yes, it seems to me as most comfort option in other project I am using SWIG or pybind11 but for this library Cython seems to be good enough. However I was considering moving from scikit-build + cmake + Cython -> meson + Cython or pybind11.

@Nic30
Copy link
Owner

Nic30 commented Apr 4, 2022

@the-moog

If I understand your example you need to generate "tops" based on components used somewhere in the hierarchy.
In Cesnet we were experimenting with this and realized that it is actually very complicated although it looks trivial.

  • The SV preprocessor performs directives when it reads them. New preproc directives do appear as result of other directives recursively.
  • SV has several versions and sideversions, vendor tools do not obey standards. This results in invalid codes being used wit some vendor tools without problems. (And you will have to deal with such a sources) Also some sources code are encrypted and you can not parse them at all.
  • Preprocessor works on string level and parsing of code is not possible until preprocessing ends. Preprocessing may loose original code locations.

Because of this and many other issues in our team we decided to generate tops entirely and instead of templates in SV like string we are using python AST object to generate the code directly and import SV/VHDL modules as black boxes.
https://github.com/Nic30/hwtLib/blob/master/hwtLib/mem/atomic/flipReg.py

We are using HWT, e.g. in this project of superscalar packet classifier with DDR4. But there are other tools which can do the same.

I do not want to say that this is a better way, nor that it is compatible with your company ideology. I am just saying that if you start using system verilog templates there will always be some obscure restriction or complicated config but and if you use python as a preprocessor and code generator it has some advantages.

@the-moog
Copy link
Author

the-moog commented Apr 5, 2022

Great, thanks. It seems my understanding of code structure is reasonably accurate. I will continue to look at what I have to do to get AST out of the preprocessor in the same way the existing parser works.

A bit more background:
In general we have a catalogue of interfaces we can call up. We need to route the signals from those interfaces to the outside world and to the internal bus and control signals. We have a few hundred GPIOs that can be allocated depending what hardware interface is plugged in. As the GPIOs connect to real hardware we can't always use a random selection for a given function, we sometimes have to restrict what can be selected. (this is metadata for a given interface and held in the wrapper)

You are correct; Computed changes are mostly towards the outside. They also affect module level verification and top level functional verification test benches.
Some interfaces 'conflict' or have to be aware of each other, or must exsist in tandom. e.g. an audio codec needs to know which audo channel it's assigned and can't exist without an audio processor.

Qualcomm, being esentially an IP company, means we are re-using propretary IP from many sources across the organisation.
Few of these IP modules confirm to a standard interface as they were originally designed for a specific silicon product and not the generic FPGA system our department maintains. So first there is an interface adapter (if necessary) then a wrapper that makes the adapter or module conform to our 'standard' module interface. It is those wrappers we are instancing and bringing out through the hierachy.

I like the sound of HWT and I will spend some time looking at it. It may be something I am able to use.

However, one fundamental requirement of this project is that the wrappers, adapters and modules are written by Verilog designers who we don't mandate have any Python ability. I originally thought use Python to describe the whole thing but that would put a requirement on the sort of skills that the developer who produces targets from this system has.

Another requirement is that I have to use YAML to define/serialise each build. That way it is compatable with other systems and existing knowledge. Currently the YAML is being used to create globaldefs though a simple build script.

My predecessor (now retired) has done this using pure SV, resulting in a lot of global parameters, includes and ifdefs. My goal is to move away from that over time as it's difficult to maintain, review and grow, and gradualy replace it with a template basd system that produces human readable and commented RTL as well as testbench code that builds without any further human input but is still human readable.

I went with a simple text based template system as that seemed apropriate, especially as I am are using Verilog like syntax for the templates. Something recognisable by Verilog engineers, allowing them to quickley adopt, but is non-synthisisable so can't 'escape' via accidental source tree merges etc into other projects.

@Thomasb81
Copy link
Contributor

Hello

You can parse the code with hdlConvertor several time, depending of the number of combination of macro definition you need to support. Then you analyze and recombine all result to build a single data model with all require detail of existence of each port.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants