Learning SQLAlchemy
After AST, comes SQLAlchemy. This week marked significant progress in my project. I spend quite substantial time to learning SQLAlchemy, an essential tool for noWorkFlow. By mid-week, I had achieved 60% progress in understanding SQLAlchemy, focusing on its ORM capabilities. This includes database connections, table definitions, relationships, querying, and session management.
Interactive Graph Enhancement
In our meeting on 06/06, my mentor provided insights on updating the visualization component from the vis-improvement
branch. This updated branch for the visualization now vis
has added more interactive features. Not only can individual nodes be collapsed by left-clicking on the trial graph, but the history graph can also be exported or restored by right-clicking. Currently, the interactive graph displays the activation flow of the script. Later on, I have been instructed to build a similar structure for code_component
. We also discussed a little about activations and evaluations, which are crucial for the upcoming comparison, especially for the latter.
Addressing Compatibility Issues
When initializing the installation of the project and running now vis, I encountered and addressed incompatibilities with Flask and Werkzeug by making a PR to fix the ImportError
(view PR here). This ImportError
existed because Flask 2.1.3 didn’t specify the dependency correctly when Werkzeug 3.0.0 was released. Flask’s requirements specify Werkzeug>=2.2.0, which allowed Werkzeug 3.0.0 to be installed even though Flask 2.2.2 isn’t compatible with it. The solution in that PR is to specify the version of Werkzeug in setup.py whenever Flask is needed, which is Werkzeug==2.3.7, the last version supported for Flask 2.1.x.
Deeper Understanding of Code Components
By the end of the week, I had completed my learning of SQLAlchemy ORM and started coding to generate the graph representation of code_component
. This graph will represent the structure of the AST, facilitating the comparison of provenance data from different trials. Our second meeting on 10/06 delved deeper into the composition and structure of code_component
.
code_component
The code_component
table represents individual components or nodes within the code. Each row in this table typically corresponds to a specific entity or element in the code, such as a variable assignment, an operator, a function call, or any other syntactic construct. Here’s an example of what the code_component
table might look like:
… | id | name | type | … |
---|---|---|---|---|
… | 1 | constructs.py | script | … |
… | … | … | … | … |
… | 31 | sum_val = x + y | assign | … |
… | 32 | = | syntax | … |
… | 33 | sum_val | name | … |
… | 34 | x + y | add | … |
… | 35 | + | syntax | … |
… | 36 | x | name | … |
… | 37 | y | name | … |
… | … | … | … | … |
- id: Unique identifier for each code component.
- name: Represents the name or expression associated with the code component. For instance,
constructs.py
represents the script file, whilesum_val = x + y
,=
,sum_val
,x + y
,+,
x
, andy
represent various components within the code. - type: Specifies the type of the code component. In this example, types include
script
,assign
(assignment),syntax
,name
,add
, etc.
composition
The composition
table defines the hierarchical relationships or dependencies between different code_component
elements. It describes how these components are structured within the AST, indicating which components are parts of others or how they relate spatially or semantically. Here’s an example of the composition
table:
… | id | part_id | whole_id | type | … |
---|---|---|---|---|---|
… | … | … | … | … | … |
… | 30 | 32 | 31 | *op_pos | … |
… | 31 | 31 | 1 | *body | … |
… | 32 | 33 | 31 | *targets | … |
… | 33 | 35 | 34 | *op_pos | … |
… | 34 | 34 | 31 | value | … |
… | … | … | … | … | … |
- id: Unique identifier for each composition relationship.
- part_id: Refers to the
code_component
id of the part or child component. It indicates which component is part of or contained within another component. - whole_id: Refers to the
code_component
id of the whole or parent component. It indicates which component encapsulates or contains another component. - type: Describes the type of composition or relationship between the components. Examples include
*op_pos
(positioning),*body
(body relationship),*targets
(target relationship), etc.
The composition
table essentially maps out how different parts of the code (represented in code_component
) are structured and interact with each other within the AST. It establishes hierarchical relationships where one code component serves a specific role relative to another, whether as a parent, child, or in some other defined relationship (type
).
For example:
- Row 30: Indicates that the component with
id=32
(=
syntax) is positioned (*op_pos
) relative to componentid=31
(sum_val = x + y
), whereid=31
is the parent or whole component. - Row 31: Specifies that component
id=31
(sum_val = x + y
) serves as the body (*body
) of higher-level construct, indicated bywhole_id=1
, whereid=1
represent a script or module level construct. - Row 32: Indicates that component
id=33
(sum_val
) targets (*targets
) the componentid=31
(sum_val = x + y
), showing a relationship whereid=33
interacts with or depends onid=31
.
With a solid foundation in SQLAlchemy and improved visualization tools, I am well-prepared to tackle the next phase of my project. In the coming weeks, I will started on implementing the visualization of code_component
ASTs. Stay tuned for more updates on my progress!