GSoC 2024 - Week 2 Progress

Diving Deeper


Learning SQLAlchemy

After AST, comes SQLAlchemy. This week marked significant progress in my project. I spend quite substantial time to learning SQLAlchemy, an essential tool for noWorkFlow. By mid-week, I had achieved 60% progress in understanding SQLAlchemy, focusing on its ORM capabilities. This includes database connections, table definitions, relationships, querying, and session management.

Interactive Graph Enhancement

In our meeting on 06/06, my mentor provided insights on updating the visualization component from the vis-improvement branch. This updated branch for the visualization now vis has added more interactive features. Not only can individual nodes be collapsed by left-clicking on the trial graph, but the history graph can also be exported or restored by right-clicking. Currently, the interactive graph displays the activation flow of the script. Later on, I have been instructed to build a similar structure for code_component. We also discussed a little about activations and evaluations, which are crucial for the upcoming comparison, especially for the latter.

Addressing Compatibility Issues

When initializing the installation of the project and running now vis, I encountered and addressed incompatibilities with Flask and Werkzeug by making a PR to fix the ImportError (view PR here). This ImportError existed because Flask 2.1.3 didn’t specify the dependency correctly when Werkzeug 3.0.0 was released. Flask’s requirements specify Werkzeug>=2.2.0, which allowed Werkzeug 3.0.0 to be installed even though Flask 2.2.2 isn’t compatible with it. The solution in that PR is to specify the version of Werkzeug in setup.py whenever Flask is needed, which is Werkzeug==2.3.7, the last version supported for Flask 2.1.x.

Deeper Understanding of Code Components

By the end of the week, I had completed my learning of SQLAlchemy ORM and started coding to generate the graph representation of code_component. This graph will represent the structure of the AST, facilitating the comparison of provenance data from different trials. Our second meeting on 10/06 delved deeper into the composition and structure of code_component.

code_component

The code_component table represents individual components or nodes within the code. Each row in this table typically corresponds to a specific entity or element in the code, such as a variable assignment, an operator, a function call, or any other syntactic construct. Here’s an example of what the code_component table might look like:

id name type
1 constructs.py script
31 sum_val = x + y assign
32 = syntax
33 sum_val name
34 x + y add
35 + syntax
36 x name
37 y name
  • id: Unique identifier for each code component.
  • name: Represents the name or expression associated with the code component. For instance, constructs.py represents the script file, while sum_val = x + y, =, sum_val, x + y, +, x, and y represent various components within the code.
  • type: Specifies the type of the code component. In this example, types include script, assign (assignment), syntax, name, add, etc.

composition

The composition table defines the hierarchical relationships or dependencies between different code_component elements. It describes how these components are structured within the AST, indicating which components are parts of others or how they relate spatially or semantically. Here’s an example of the composition table:

id part_id whole_id type
30 32 31 *op_pos
31 31 1 *body
32 33 31 *targets
33 35 34 *op_pos
34 34 31 value
  • id: Unique identifier for each composition relationship.
  • part_id: Refers to the code_component id of the part or child component. It indicates which component is part of or contained within another component.
  • whole_id: Refers to the code_component id of the whole or parent component. It indicates which component encapsulates or contains another component.
  • type: Describes the type of composition or relationship between the components. Examples include *op_pos (positioning), *body (body relationship), *targets (target relationship), etc.

The composition table essentially maps out how different parts of the code (represented in code_component) are structured and interact with each other within the AST. It establishes hierarchical relationships where one code component serves a specific role relative to another, whether as a parent, child, or in some other defined relationship (type).

For example:

  • Row 30: Indicates that the component with id=32 (= syntax) is positioned (*op_pos) relative to component id=31 (sum_val = x + y), where id=31 is the parent or whole component.
  • Row 31: Specifies that component id=31 (sum_val = x + y) serves as the body (*body) of higher-level construct, indicated by whole_id=1, where id=1 represent a script or module level construct.
  • Row 32: Indicates that component id=33 (sum_val) targets (*targets) the component id=31 (sum_val = x + y), showing a relationship where id=33 interacts with or depends on id=31.

With a solid foundation in SQLAlchemy and improved visualization tools, I am well-prepared to tackle the next phase of my project. In the coming weeks, I will started on implementing the visualization of code_component ASTs. Stay tuned for more updates on my progress!